unicef / kindly

GNU Affero General Public License v3.0
24 stars 17 forks source link

Combine training dataset #109

Closed sabinevidal closed 2 years ago

sabinevidal commented 2 years ago

For task 1 in #108:

Script to combine current training data in kindly_luis_export.json with format [text], [label_number]. Where the label number is determined by whether the intent in the original dataset is 'bullying', or was set to 'ignore' or 'none'.

Original script was not differentiating between the utterances[index].intent when it was given in the string format, but worked when intent strings were included in an array. Possibly don't need bully variable?

nathanfletcher commented 2 years ago

For task 1 in #108:

Script to combine current training data in kindly_luis_export.json with format [text], [label_number]. Where the label number is determined by whether the intent in the original dataset is 'bullying', or was set to 'ignore' or 'none'.

Original script was not differentiating between the utterances[index].intent when it was given in the string format, but worked when intent strings were included in an array. Possibly don't need bully variable?

@sabinevidal yes you're absolutely right. We don't need the bully variable from Luis in the JSON

lacabra commented 2 years ago

Thanks @sabinevidal for reverting these changes. This is one valid way to do it. The upside of this approach is that it keeps a linear history, the downside is that it clutters history with unnecessary commits. There are big projects where they don't allow to re-write history, but since this project is small, and we allow it, I would prefer another approach that only includes the relevant commits (I acknowledge this is an opinionated take on the matter). So, for the reasons outlined here, and for educational purposes, can you please try the following (search force push for additional context):

  1. Identify the last good commit to keep (either by git log or looking at the commits of this PR), which is bf5d29d
  2. Then run the following to go back to that commit
    git reset --hard bf5d29d
  3. Force push the chosen commit above as the last valid one, effectively discarding anything afterwards:
    git push --force

Let me know if you have any questions, thanks!

lacabra commented 2 years ago

When doing the above, please make a copy of the changes introduced in the commits we will be discarding elsewhere, so that you can refer to them later and include them in different PRs. Otherwise those changes will be lost forever, as we are erasing any trace of them.

sabinevidal commented 2 years ago

@lacabra and @nathanfletcher Have reverted and removed the changes from this PR (and have created a new branch with the other changes). Thanks for the guidance on this! Understanding the difference between the options to re-write history now 👍

Have also updated the script to combine the contribution data into a json file as discussed above.