sign-language-processing / sign-language-processing.github.io

Documentation and background of sign language processing
111 stars 9 forks source link

streamline process for adding a dataset to the table? #30

Open cleong110 opened 6 months ago

cleong110 commented 6 months ago

brainstorming based on #27...

  1. some sort of interactive form that asks for fields like the ones in the other datasets already listed? (here's the schema generated previously with genson: datasets_schema.json ). Maybe with Google Forms export feature?
  2. perhaps in conjunction with that, a way to parse a bibtex citation to autofill would be nice.
  3. automatically add to references.bib if missing.
  4. find and show (or even autofix?) inconsistencies of index.md with references.bib, maybe see if e.g. (a) there's citation keys that are present in references.bib and also in index.md but without a preceding @ in index.md, or (b) if there's a @whatever in index.md which is listed as @dataset:whatever in references.bib.

Of course too much feature creep could turn the relatively simple website into a clone of something like the HuggingFace dataset website huggingface.co/datasets, which is beyond the scope currently. In that case it'd be better to rework the whole thing as a full-on database with proper infra to support it.

cleong110 commented 6 months ago

Side note, I think it might be interesting to add "linguistic" as a potential feature of a dataset, and I propose one of these as the emoji:

AmitMY commented 6 months ago

I think you have many feature requests here: For example, #4 is writing an automatic test

I like things simple - I think JSON is simple enough, but if there were some sort of automatic feedback it would be even better, so a form could work.

As for "linguistic" - sure, I think "label" is the nicest one