mikelor / TsaThroughput

Monitors the TSA Published Statistics, Downloads new PDF files and Saves as .json
MIT License
32 stars 5 forks source link

cvtPdfToJson step does not seem to be recognizing some hourly cells #12

Closed mikelor closed 2 years ago

mikelor commented 2 years ago

The cvtPdfToJson step seems to be adding multiple instances of an hour, causing a data issue.

As an example the 03.27.2019 data for ATL shows 7 Pax for the Main Checkpoint at 01:00, and 16 Pax for the 02:00 hour. The TSAThroughputApp is instead creating two 01:00 nodes for the ATL Main Checkpoint instead of creating a 02:00 node.

This throws off the numbers. I'm unsure as to how many airports/rows this is affecting at this time. The result is an "undercount", since the cvtJsonToCsv step averages the two hours. In the example above the 01:00 hour would show up as (7 + 16) / 2 = 11.5

mikelor commented 2 years ago

After researching this. It appears that the form-recognizer is not interpreting the column for 02:00. This issue is similar to the one I reported to microsoft for the 2.1 recognizer. But it also appears to occur in this version as well. https://docs.microsoft.com/en-us/answers/questions/577259/form-recognizer-change-in-behavior-in-21-vs-20-rec.html