src-d / ml-backlog

Issues belonging to source{d}'s Machine Learning team which cannot be related to a specific repository.
0 stars 3 forks source link

Publish the dataset of identifiers split by the biLSTM model #78

Open irinakhismatullina opened 5 years ago

irinakhismatullina commented 5 years ago

In our identifiers dataset we have splits made by the TokenParser, working on primitive heuristics.

Since the biLSTM model was finally published, we can add splits by that model to the dataset.

I already have them calculated and can do that in no time, if I have the instructions, or provide all data to the one responsible for the publishing.

irinakhismatullina commented 5 years ago

@zurk @vmarkovtsev

zurk commented 5 years ago

Can you please attach a file with diff between old and new datasets to see what exactly was changed. It can be very helpful.

vmarkovtsev commented 5 years ago

Assigning Waren. TBD when he returns from vacations.