sign-language-processing / sign-language-processing.github.io

Documentation and background of sign language processing
117 stars 9 forks source link

Feature/fix bare citations script #115

Closed cleong110 closed 4 months ago

cleong110 commented 4 months ago

Fixes #113

Reads list of citation keys, uses regex to check for those keys, but if they don't have @, add it. Also, print out line number and key that was fixed.

cleong110 commented 4 months ago

That would be better! I will set that up

On Sat, Jun 29, 2024 at 5:10 AM Amit Moryossef @.***> wrote:

@.**** requested changes on this pull request.

In Makefile https://github.com/sign-language-processing/sign-language-processing.github.io/pull/115#discussion_r1659675570 :

@@ -18,6 +18,7 @@ dst/thesis.pdf: dst/index_shortcode.md src/references.bib

dst/index.md: src/index.md src/markdown_fix.sh src/formats.md dst tmp/datasets.md dst/assets cat src/index.md > $@

  • python src/fix_bare_citations.py src/references.bib $@

i think it would be best if this was not a "fixer" (since it does not fix the source code, and allows us to write errors) instead, i think this should be a "linter" - write a small github action that runs this script, and throws an error if this happens. then you get quick feedback in the PR

— Reply to this email directly, view it on GitHub https://github.com/sign-language-processing/sign-language-processing.github.io/pull/115#pullrequestreview-2149410431, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5FSTNLXVPFYQOV7L27JDBTZJZ2ZHAVCNFSM6AAAAABKCIQ44CVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCNBZGQYTANBTGE . You are receiving this because you authored the thread.Message ID: <sign-language-processing/sign-language-processing.github. @.***>

cleong110 commented 4 months ago

Updated the script, renamed it, added a deploy action that calls it. Should now give an error on PR if there is a bare citation.

Also it no longer bugs you about citations in comments

cleong110 commented 4 months ago

One thing I haven't been able to figure out is how to find the line number of an issue in the original content.

I tried

lines = content.splitlines()

followed later by

line_index = lines.index(line)

Which adds almost zero overhead to run time. That was cool, I thought.

But upon testing I found that if there's two identical lines for whatever reason, it will always find the first one.

I also investigated [i for i, content_line in enumerate(content_lines) if content_line == matching_line], and there's still no overhead that I can detect, we're still around 0.04 seconds.

But I just couldn't get it to work right easily.

So never mind. It would be nice-to-have, but maybe something we can add later.

cleong110 commented 4 months ago

Another nice-to-have (which I gave up on) would be to show the index in the line, something like

line xx, column yy:
    lorem ipsum foo_etal_2024 dolor sit amet
                ^

And I got something working, but it failed in the case where we had a very long line.

line xx, column yy:
    lorem ipsum foo_etal_2024 dolor sit amet blah blah
line wrap happens blah blah blah and now it looks bad
                ^