nsheff / LOLA

Locus Overlap Analysis: Enrichment of Genomic Ranges
http://code.databio.org/LOLA
71 stars 19 forks source link

Column naming #5

Closed fhalbritter closed 9 years ago

fhalbritter commented 9 years ago

I think you should re-think the names of some of the output columns:

nsheff commented 9 years ago

Please make separate issues for separate complaints :)

support just has special significance, so it gets a special name, and a special order. it's equivalent to "a". b/c/d are just there for reference, and don't need to be prioritized in the order.

the variables have different styles because they come from different places. I guess I will switch everything to camelCase, since it bothers you so much.

The abbreviation is worth saving 1 character because these are interactively explored in R, where space counts, and "rank" is repeated 5 times, so it saves me 5 width characters on a display output, which is always limited by the column name. It seemed, and still seems, worth it to me.

On 04/20/2015 03:17 PM, Florian Halbritter wrote:

I think you should re-think the names of some of the output columns:

  • why are the four cells of the contingency table called /support/, /b/, /c/, and /d/? why do they not appear together in the table order? --> consider renaming all columns with meaningful variables names or prefix them, e.g. by /cont/ to make clear they're conceptually linked
  • you're using a mix of camelCase and hyphenated variables. ouch. /userSet/, /dbSet/, etc. vs. cell-type, /data-source/, etc.
  • for abbreviations that save only one symbol, I sometimes wonder whether it's really necessary to abbreviate them at all, e.g. /rnkSup/ (/rankSup/, or even /rankSupport/ (?)), /maxRnk/ (/maxRank/), etc.

— Reply to this email directly or view it on GitHub https://github.com/sheffien/LOLA/issues/5.

fhalbritter commented 9 years ago

Ok, separate issues in future. Mind you, these are not "complaints", but rather suggestions.

I can think of very few cases where "b", "c", "d" make acceptable variable names.

Thank you. I understand they come from different places, but since you're in control of all places, why not standardize them? Or, if you choose to have different naming conventions for annotation files, and for within R, convert between them as necessary.

Righto. In that case I'd vote for going 100% teenager. usrSet, dbSet, dscr, pValLog, suprt, fName, ... I've just saved you another 15 characters or thereabout.

Just joking, ignore the last point.

nsheff commented 9 years ago

I propose that the case where c, b, and d correspond to variables in a statistical model) is exactly such a case where they are acceptable variable names.