libAtoms / abcd

1 stars 4 forks source link

Truncated labels appear to be the same #57

Closed eszter137 closed 4 years ago

eszter137 commented 4 years ago

When the beginnig of two labels agree and they are truncated that much, the histogram will not recognise that there are two of them:

$ abcd summary -q calculated_by_eszter -p filename -t 40

info.filename count: 3007 unique: 1
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 3007 /Users/es732/mnt/womble/Eszter/molpro_te...
$ abcd summary -q calculated_by_eszter -p filename -t 80

info.filename count: 3007 unique: 2
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 1507 /Users/es732/mnt/womble/Eszter/molpro_test/water_fits_from_libatoms/files_from_A...
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉  1500 /Users/es732/mnt/womble/Eszter/molpro_test/compressed_nonamer/gold_standard/bsse...
gabor1 commented 4 years ago

I actually quite like this behaviour.

eszter137 commented 4 years ago

I think this can be misleading because the default truncation is very short:

$ abcd summary -q calculated_by_eszter -p filename

info.filename count: 3007 unique: 1
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 3007 /Users/es732/mnt/wom...

Maybe it could print a warning message that the labels differ at the x-th characters? Or print them with that length by default?

gabor1 commented 4 years ago

Let's see what other people think. what I like about the current behaviour is that it is consistent. it is printing a truncated string, and the histogram is made for the thing that is printed.

the ... tells you that there is truncation.

I Agree that the "unique: 1" is misleading. that should tell me how many different ones exist (without truncation)

eszter137 commented 4 years ago

I also agree that it would be enough to fix the "unique: 1" part.