rseng / rse

tools for assessment and categorization of research software
https://rseng.github.io/rse/
Mozilla Public License 2.0
15 stars 2 forks source link

ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5. #78

Closed NickleDave closed 1 year ago

NickleDave commented 1 year ago

Description

I'm back to trying to run rse import on an edited copy of the google-sheet from @rhine3's bioacoustics-software repo.
It now crashes with this

Found 70 results
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.

Full traceback below. I think rse makes it through all the urls now and the source of this crash is something else? Unless I'm reading the traceback wrong somehow

What I Did

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv"
INFO:rse.main.import.google-sheet:Found software record: https://github.com/patriceguyot/Acoustic_Indices
INFO:rse.main.import.google-sheet:Found software record: https://www.adobe.com/products/audition.html
INFO:rse.main.import.google-sheet:Found software record: https://www.titley-scientific.com/us/anabat-insight.html
INFO:rse.main.import.google-sheet:Found software record: https://datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
INFO:rse.main.import.google-sheet:Found software record: https://github.com/ChristianBergler/ANIMAL-SPOT
INFO:rse.main.import.google-sheet:Found software record: https://arbimon.rfcx.org/
INFO:rse.main.import.google-sheet:Found software record: https://soundanalysis.wp.st-andrews.ac.uk/
INFO:rse.main.import.google-sheet:Found software record: https://www.audacityteam.org/download/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nwolek/audiomoth-scripts
INFO:rse.main.import.google-sheet:Found software record: https://github.com/sarabsethi/audioset_soundscape_feats_sethi2019
INFO:rse.main.import.google-sheet:Found software record: https://autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/timsainb/AVGN
INFO:rse.main.import.google-sheet:Found software record: http://www.avianz.net/index.php
INFO:rse.main.import.google-sheet:Found software record: http://www.avisoft.com/sound-analysis/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/EricArcher/banter
INFO:rse.main.import.google-sheet:Found software record: https://bitbucket.org/chrisscott/batclassify/src
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macaodha/batdetect
INFO:rse.main.import.google-sheet:Found software record: https://www.batlogger.com/en/products/batexplorer/
INFO:rse.main.import.google-sheet:Found software record: https://www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/bioacoustics/index.html
INFO:rse.main.import.google-sheet:Found software record: https://birdnet.cornell.edu/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxclassify
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxdetect
INFO:rse.main.import.google-sheet:Found software record: https://github.com/OpenWild/caracal
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/crowsetta
INFO:rse.main.import.google-sheet:Found software record: https://github.com/MarineBioAcousticsRC/DetEdit
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DrCoffey/DeepSqueak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nilomr/fieldtools
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DenaJGibbon/gibbonR-package
INFO:rse.main.import.google-sheet:Found software record: http://www.oldbird.org/glassofire.htm
INFO:rse.main.import.google-sheet:Found software record: https://www.goldwave.com/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/Cdevenish/hardRain
INFO:rse.main.import.google-sheet:Found software record: https://sites.google.com/view/alcore-suzuki/home/harkbird
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/hybrid-vocal-classifier
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DanWoodrich/INSTINCT
INFO:rse.main.import.google-sheet:Found software record: http://bioacoustics.us/ishmael.html
INFO:rse.main.import.google-sheet:Found software record: https://www.wildlifeacoustics.com/products/kaleidoscope-pro
INFO:rse.main.import.google-sheet:Found software record: https://meridian.cs.dal.ca/2015/04/12/ketos/
INFO:rse.main.import.google-sheet:Found software record: https://koe.io.ac.nz/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shyamblast/Koogu/tree/v0.6.5
INFO:rse.main.import.google-sheet:Found software record: https://librosa.org/librosa/
INFO:rse.main.import.google-sheet:Found software record: https://rflachlan.github.io/Luscinia/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/monitoR/index.html
INFO:rse.main.import.google-sheet:Found software record: https://marce10.github.io/ohun/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/kitzeslab/opensoundscape
INFO:rse.main.import.google-sheet:Found software record: https://www.pamguard.org/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/TaikiSan21/PAMr
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YannickJadoul/Parselmouth
INFO:rse.main.import.google-sheet:Found software record: https://www.fon.hum.uva.nl/praat/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shivChitinous/prinia-project
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-lite/
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-pro
INFO:rse.main.import.google-sheet:Found software record: https://www.reaper.fm/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/scikit-maad/scikit-maad
INFO:rse.main.import.google-sheet:Found software record: https://docs.scipy.org/doc/scipy/reference/signal.html
INFO:rse.main.import.google-sheet:Found software record: http://dx.doi.org/10.6084/m9.figshare.3792780
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/seewave/index.html
INFO:rse.main.import.google-sheet:Found software record: https://www.sonicvisualiser.org/
INFO:rse.main.import.google-sheet:Found software record: https://sonobat.com/
INFO:rse.main.import.google-sheet:Found software record: https://doi.org/10.1080/09524622.2013.827588
INFO:rse.main.import.google-sheet:Found software record: https://soundata.readthedocs.io/en/latest/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/soundecology/vignettes/intro.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macster110/aipam
INFO:rse.main.import.google-sheet:Found software record: https://github.com/rhine3/specky
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YvesBas/Tadarida-L

https://github.com/YvesBas/Tadarida-D

https://github.com/YvesBas/Tadarida-C
INFO:rse.main.import.google-sheet:Found software record: https://www.cetus.ucsd.edu/technologies_triton.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/yardencsGitHub/tweetynet
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/vak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/HaroldMills/Vesper
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/warbleR/index.html
Found 70 results
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.
Traceback (most recent call last):
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/bin/rse", line 8, in <module>
    sys.exit(main())
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/__init__.py", line 520, in main
    main(args=args, extra=extra)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/client/imp.py", line 28, in main
    importer.create(
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/main/scrapers/googlesheet.py", line 99, in create
    result = update_nonempty(result, data)
  File "/home/pimienta/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software/.venv/lib/python3.10/site-packages/rse/utils/strings.py", line 13, in update_nonempty
    for key, value in source.items():
AttributeError: 'NoneType' object has no attribute 'items'
vsoch commented 1 year ago

It's a malformed row or identifier - I can see it in your logs:

ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.

But the client should still skip and provide an error message. Give this branch a try. https://github.com/rseng/rse/pull/79. I'm going for a run - back after!

NickleDave commented 1 year ago

Thank you for starting that branch but I'm not quite sure I understand your reasoning for skipping a malformed row.

Wouldn't that just fail silently without the user not knowing they'd lost a record?

Maybe it would be helpful here to have a clearer error message. Something that prevents us from ending up at the cryptic 'NoneType' object has no attribute 'items'

vsoch commented 1 year ago

Yes I added a warning.

vsoch commented 1 year ago

It’s up to the user to run it again, but if they have just used up extensive numbers of their git api limit and I throw an error and they lose the parsed data, that’s a very frustrating outcome.

Speaking from personal experience.

NickleDave commented 1 year ago

if they have just used up extensive numbers of their git api limit and I throw an error and they lose the parsed data, that’s a very frustrating outcome.

Ah, this makes sense.

Could there be a finally block that writes the parsed data even if there's an exception raised?

NickleDave commented 1 year ago

Could there be a finally block that writes the parsed data even if there's an exception raised?

Nvm, I see there's nesting that makes this non-trivial

NickleDave commented 1 year ago

Passing out for the night but will return to try that branch

vsoch commented 1 year ago

Fixed by #79