mozilla / bugbug

Platform for Machine Learning projects on Software Engineering
Mozilla Public License 2.0
504 stars 311 forks source link

Consider moving to a third-party library to handle datasets (e.g. the huggingface datasets library) #4377

Open marco-c opened 3 months ago

marco-c commented 3 months ago

So we can drop the custom code from db.py, and also we should have improved performance.

marco-c commented 5 days ago

The first step here should be to use the dataset library in db.py, only changing the implementation of the various functions but keeping the same interface.

Then, as a next step, we can simplify the rest of the code to directly use the dataset library and remove db.py completely.