I used a local docker server to parse the above document, which has 239 pages. However, the ingestor only parsed 158 pages, and the remaining content was discarded. Is this a bug?
Here is the logs:
processing page: 140 Number of p_tags.... 178
processing page: 141 Number of p_tags.... 4
processing page: 142 Number of p_tags.... 251
processing page: 143 Number of p_tags.... 303
processing page: 144 Number of p_tags.... 322
processing page: 145 Number of p_tags.... 287
processing page: 146 Number of p_tags.... 330
processing page: 147 Number of p_tags.... 308
processing page: 148 Number of p_tags.... 265
processing page: 149 Number of p_tags.... 312
processing page: 150 Number of p_tags.... 298
processing page: 151 Number of p_tags.... 346
processing page: 152 Number of p_tags.... 412
processing page: 153 Number of p_tags.... 287
processing page: 154 Number of p_tags.... 193
processing page: 155 Number of p_tags.... 5
processing page: 156 192.168.65.1 - - [18/Apr/2024 14:24:54] "POST /api/parseDocument?renderFormat=all HTTP/1.1" 200 -
pythonlearn.pdf
I used a local docker server to parse the above document, which has 239 pages. However, the ingestor only parsed 158 pages, and the remaining content was discarded. Is this a bug?
Here is the logs:
processing page: 140 Number of p_tags.... 178 processing page: 141 Number of p_tags.... 4 processing page: 142 Number of p_tags.... 251 processing page: 143 Number of p_tags.... 303 processing page: 144 Number of p_tags.... 322 processing page: 145 Number of p_tags.... 287 processing page: 146 Number of p_tags.... 330 processing page: 147 Number of p_tags.... 308 processing page: 148 Number of p_tags.... 265 processing page: 149 Number of p_tags.... 312 processing page: 150 Number of p_tags.... 298 processing page: 151 Number of p_tags.... 346 processing page: 152 Number of p_tags.... 412 processing page: 153 Number of p_tags.... 287 processing page: 154 Number of p_tags.... 193 processing page: 155 Number of p_tags.... 5 processing page: 156 192.168.65.1 - - [18/Apr/2024 14:24:54] "POST /api/parseDocument?renderFormat=all HTTP/1.1" 200 -