yuchenlin / rebiber

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
https://yuchenlin.xyz/
MIT License
2.57k stars 156 forks source link

Some references are filtered by `load_bib_file` #21

Closed AyanoClarke closed 3 years ago

AyanoClarke commented 3 years ago

It's a great tools, but when I try to transfer my .bib file, which is generated by an application BibDesk, the references are filtered, here is a minimal example of my bib file.

@inproceedings{zhang2019heterogeneous,
        author = {Zhang, Chuxu and Song, Dongjin and Huang, Chao and Swami, Ananthram and Chawla, Nitesh V},
        booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
        date-added = {2021-04-03 01:39:20 +0800},
        date-modified = {2021-04-03 01:44:13 +0800},
        keywords = {Recommender system, Graph Neural Network},
        pages = {793--803},
        title = {Heterogeneous graph neural network},
        year = {2019},
        Bdsk-Url-1 = {https://doi.org/10.1145/3292500.3330961}}

I think this is due to load_bib_file. The last line of this reference contains {, so load_bib_file skipped this reference.

However, in the BibtexParser, this kind of bib file can be recognized.

tianylin98 commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

but the weird thing is that the following bibtex isn't skipped at all.

@misc{devlin2019bert,
      title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding}, 
      author={Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
      year={2019},
      eprint={1810.04805},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
AyanoClarke commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Have you try the beta web app? This bib would exist after the transferring.

tianylin98 commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Have you try the beta web app? This bib would exist after the transferring.

I tried with a file that has only this bib in it and it remains. However, my original bib file still suffer from this problem. So, I don't know, maybe something else triggered it (e.g., comments in bib)

yangxqiao commented 3 years ago

It's a great tools, but when I try to transfer my .bib file, which is generated by an application BibDesk, the references are filtered, here is a minimal example of my bib file.

@inproceedings{zhang2019heterogeneous,
        author = {Zhang, Chuxu and Song, Dongjin and Huang, Chao and Swami, Ananthram and Chawla, Nitesh V},
        booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
        date-added = {2021-04-03 01:39:20 +0800},
        date-modified = {2021-04-03 01:44:13 +0800},
        keywords = {Recommender system, Graph Neural Network},
        pages = {793--803},
        title = {Heterogeneous graph neural network},
        year = {2019},
        Bdsk-Url-1 = {https://doi.org/10.1145/3292500.3330961}}

I think this is due to load_bib_file. The last line of this reference contains {, so load_bib_file skipped this reference.

However, in the BibtexParser, this kind of bib file can be recognized.

Thanks for the comment! I updated the load_bib_file to handle the case when there are two closed curly braces } in the last line of the entry (#24). Could you try again to see if the updated function works for you?

yangxqiao commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Have you try the beta web app? This bib would exist after the transferring.

I tried with a file that has only this bib in it and it remains. However, my original bib file still suffer from this problem. So, I don't know, maybe something else triggered it (e.g., comments in bib)

Hi, thanks for the comment! I think this bib entry is skipped because the paper is not in the database. It should be in ICLR 2021, but because the conference proceedings are just available few days ago, the database doesn't have them yet.

tianylin98 commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Have you try the beta web app? This bib would exist after the transferring.

I tried with a file that has only this bib in it and it remains. However, my original bib file still suffer from this problem. So, I don't know, maybe something else triggered it (e.g., comments in bib)

Hi, thanks for the comment! I think this bib entry is skipped because the paper is not in the database. It should be in ICLR 2021, but because the conference proceedings are just available few days ago, the database doesn't have them yet.

It doesn't explain why the toolkit doesn't skip this bib when I tried with a file that has only this bib in it. I cannot figure out what could be the possible reason for this yet. Guess it isn't a big issue, huh?

yangxqiao commented 3 years ago

got exactly the same issue. my the BibTeX that was skipped was:

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Have you try the beta web app? This bib would exist after the transferring.

I tried with a file that has only this bib in it and it remains. However, my original bib file still suffer from this problem. So, I don't know, maybe something else triggered it (e.g., comments in bib)

Hi, thanks for the comment! I think this bib entry is skipped because the paper is not in the database. It should be in ICLR 2021, but because the conference proceedings are just available few days ago, the database doesn't have them yet.

It doesn't explain why the toolkit doesn't skip this bib when I tried with a file that has only this bib in it. I cannot figure out what could be the possible reason for this yet. Guess it isn't a big issue, huh?

I've added this bib entry to the example_input.bib file and found it still existed in the output file after execution. And theoretically, whether this bib entry is the only entry or not, it should be in the output file. I guess maybe as you said, something else in the bib file triggered it. Would you mind sharing a longer example or the input bib file you used when the issue occurred?

tianylin98 commented 3 years ago

I've just figured out the problem. My problem is that there is a comment at the end of the previous bib entry, e.g.

@misc{qiu2020blockwise,
      title={Blockwise Self-Attention for Long Document Understanding}, 
      author={Jiezhong Qiu and Hao Ma and Omer Levy and Scott Wen-tau Yih and Sinong Wang and Jie Tang},
      year={2020},
      eprint={1911.02972},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
} %foo bar

@misc{he2021deberta,
      title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention}, 
      author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2006.03654},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Everything turns out fine after removing this comment. So I guess there's some problem with the tool handling comments (The comments cancel out the brace in the same line). @yangxqiao