nuwandavek / talktopapers

30 stars 27 forks source link

Paper Parser Fails #1

Open data-hound opened 1 year ago

data-hound commented 1 year ago

Hi

This is really an awesome project, and could become a very handy open tool using @keerthanpg's colab! Anyways, I have been fiddling with the scripts but I found that not all kinds of papers can be parsed yet. I tried some of the recent NeurIPS papers: https://proceedings.neurips.cc/paper/2021/file/007ff380ee5ac49ffc34442f5c2a2b86-Paper.pdf, https://proceedings.neurips.cc/paper/2021/file/003dd617c12d444ff9c80f717c3fa982-Paper.pdf

But, the parser returned an empty list. Do you have any idea what could be the issue here? I suspected two column formatting might be a problem, since the initial couple of papers I used were two-column format, and the parser failed. But now, its failing with the single column format (Neurips - the same venue as the demo paper) as well. Maybe using pdfplumber could be a better option?

Harsharma2308 commented 1 year ago

The filtered text has weird text length going to more than 15k ? The data frame created is 64,3. Though the text length seems spurious. Some issue with the pdf parser maybe?

image image

image