yuma-m / pychord

Python library to handle musical chords.
https://pypi.python.org/pypi/pychord
MIT License
239 stars 46 forks source link

more qualities based on large data scraping #90

Open eyaler opened 1 year ago

eyaler commented 1 year ago

Hi!

this is a followup on some of the comments in issue#34 this followup analysis is based on 117k songs form UltimateGuitarTabs 1960-2023 in the rock, pop, country and folk genres, totaling at 9.9M chords instances and 6000 different chords. this is for my project https://github.com/eyaler/uku3le currently being reworked.

These are the most common issues and qualities that fail parsing, and seem to have sensible solutions

  1. German notation uses H, Hm for B, Bm and this is relevant also for base chords.
  2. 7sus, 9sus with out a following number. should probably be a synonym for 7sus4, 9sus4?
  3. Maj7, should be allowed for maj7
  4. mmaj7, mMaj7 should be allowed for mM7
  5. 7sus2 afaiu is (0, 2, 7, 10)
  6. (add9), m(add9), (maj7), m(maj7), same for maj9, (aug), (dim) (sus4), 7(sus4), (sus2), 7(sus2), (2), (4), (5), (7), (9), (11) -> remove brackets
  7. add2 and (add2) afaiu is (0, 2, 4, 7)
  8. 6sus2 afaiu is (0, 2, 7, 9)
  9. E# -> F, B# -> C, Fb -> E, Cb -> B, H# -> C, and also for base chords
  10. maj7sus2 is (0, 2, 7, 11)?
  11. maj7sus4 is (0, 5, 7, 11)?
  12. ends with + or +5 to donate aug,
  13. ends with m+ or m+5 or m# or m#5 for (0, 3, 8)?
  14. ends with 7+ to donate 7+5
  15. maj7+5 (or Maj7+5) to donate (0, 4, 8, 11)?
  16. 6sus4 (or just 6sus) afaikt is (0, 5, 7, 9)
  17. strip white space so e.g: "C " is "C", "A7 " is "A7"
  18. 7M is probably maj7
  19. 6add9 is 69? and m6add9 to m69
  20. madd11 to (0, 3, 7, 17)?
  21. sus7 is probably 7sus4?
  22. strip asterisks (*)
  23. replace ° or º to dim, also if quality (ignoring base) is just 'o'
  24. if the quality (ignoring the base) is just 'M' or 'mi' it is probably safe to assume it is 'm'
  25. fixing caps where obvious: ADD, Add -> add; MAJ, Maj -> maj, SUS...
  26. m(maj9) -> (0, 3, 7, 10, 14) ?
  27. add4add9, add9add4 -> (0, 4, 5, 7, 14)
  28. min7 -> m7
  29. ma7 -> maj7
  30. -5, (-5) -> omit5
  31. maj7#11, maj7+11 -> M7+11
  32. add#11 -> (0, 4, 7, 18) ?
  33. m13 -> (0, 3, 7, 10, 14, 21) ?
  34. s4 -> sus4
  35. 7add11 and 6add11 -> (0, 4, 7, 10, 17)) and (0, 4, 7, 9, 17)) ?
  36. i also take care of do/re/mi/fa/sol/la/si (case insensitive) which may be following by #/b/7/m and as the base chord

i could go on... but the above helped me reduce the song reject rate in my case from 6.3% to 1.1%

fixes may be required also in from_note_index()

of course instead of dealing with all specific cases it would be useful to have generic normalization rules as fixing caps where no ambiguity, eg: ADD, Add -> add fixing strings where no ambiguity, eg: maj -> M removing brackets where no ambiguity ends with + or +5 -> aug etc. such generic rules (where there is no danger of ambiguity) would greatly help maintaining the qualities table.

disclaimer: i do not know anything about music or music theory.

yuma-m commented 1 year ago

Hello @eyaler, Thank you for raising this issue. Let me give general guidance for your findings.

If you want to contribute or discuss further for each item, I appreciate it if you could create separate pull requests and issues.