Closed dvsrepo closed 1 year ago
Hello @dvsrepo
Thats very interesting what you've done so far. I would definitely be open to sharing a set of flagged records to further improve coordination.
I've incorporated the alpaca-garbage-collector-multilingual model into the gradio gui.
Hi @gururise, first of all, thanks and congrats on this important effort.
I'm Dani from Argilla. We've spent some time looking at data quality issues of the Alpaca dataset and its translations. We're helping out teams of the Spanish and German efforts to use Argilla for flagging bad or problematic instructions so that they can be later fixed (either manually or with post-processing).
Along the way, we've spent some time labeling AlpacaDataCleaned. It has already good quality but there are still examples to improve, so we'd like to contribute.
Today we have released this model to help teams with cleaning up Alpaca translation, but this can be used to contribute to this repo too: https://huggingface.co/argilla/alpaca-garbage-collector-multilingual
We've also deployed this space for browsing and validating the records. This is what it shows for last night's version of
AlpacaCleaned
(login with argilla/1234).We plan to spend some more time labeling and contributing back to this project. My question is if it would be possible to share a set of flagged records (with positional ids as in the original json) with you to make sure we edit them in the right way. For example, what do with requests related to attached photos, paintings, and so on.