Closed mikelor closed 2 years ago
After researching this. It appears that the form-recognizer is not interpreting the column for 02:00. This issue is similar to the one I reported to microsoft for the 2.1 recognizer. But it also appears to occur in this version as well. https://docs.microsoft.com/en-us/answers/questions/577259/form-recognizer-change-in-behavior-in-21-vs-20-rec.html
The cvtPdfToJson step seems to be adding multiple instances of an hour, causing a data issue.
As an example the 03.27.2019 data for ATL shows 7 Pax for the Main Checkpoint at 01:00, and 16 Pax for the 02:00 hour. The TSAThroughputApp is instead creating two 01:00 nodes for the ATL Main Checkpoint instead of creating a 02:00 node.
This throws off the numbers. I'm unsure as to how many airports/rows this is affecting at this time. The result is an "undercount", since the cvtJsonToCsv step averages the two hours. In the example above the 01:00 hour would show up as (7 + 16) / 2 = 11.5