vocalpy / crowsetta

A tool to work with any format for annotating vocalizations
https://crowsetta.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
49 stars 3 forks source link

DEV/CLN: Drop pandera dependency? #269

Open NickleDave opened 3 months ago

NickleDave commented 3 months ago

Our using pandera to validata dataframes really adds to the number of things that get installed when you install crowsetta, largely because pandera depends on pydantic

This makes it it more likely that some change upstream will impact people that just want to use crowsetta so their own library can parse annotations, see for example https://github.com/kitzeslab/opensoundscape/issues/1017 and https://github.com/vocalpy/vocalpy/issues/173

I recall looking at "pure Python" libraries for validating dataframes before, I wonder if there's one we could vendor to avoid. Like typedframe maybe

NickleDave commented 3 months ago

Thinking out loud: I guess most tools are pretty consistent about how they save so it would have to be someone working with the exported annotations that corrupts the files.

It would be good to know if anyone else had cases where pandera caught some problem with annotation files, and that was helpful. I think most of the validation errors I've gotten have been because of a mistake I made (e.g. building a simple-seq annotation file "manually" with pandas from some one-off annotation format, then trying to load it with crowsetta)

Maybe we just need clearer error messages instead of strict validation