linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

[decision] How to handle leading/trailing underscores in names #21

Closed JobLeonard closed 8 years ago

JobLeonard commented 8 years ago

This came up in the form-upload ticket, but I think we should decide how to handle this since it's a more general infrastructure thing.

Currently, double underscores have special meaning as separators. Sometimes we combine names into one string, with said double underscores as separators:

PUT /loom/{transcriptome}__{project}__{dataset}

If we allow leading/trailing underscores in names this can create problems. For example, assume double underscores get interpreted greedily as they are encountered, and the values sent in are the following:

{
    transcriptome: "foo_",
    project: "bar_",
    dataset: "_baz",
}

This would translate to:

 PUT /loom/foo___bar____baz

Which in turn would be interpreted as:

{
    transcriptome: "foo",
    project: "_bar",
    dataset: "", // or "_" or "__baz", all of which would create more problems later on
}

This is disturbingly fragile! To fix this, we can disallow leading/trailing underscores. This is my preferred solution. Another is to change separators; we only allow the 26 letters of the English alphabet, single underscores, and number at the moment anyway, so we have plenty of symbols to choose from.

If we decide that separators should always be interpreted greedily, we could allow leading underscores but disallow trailing ones. This requires the least amount of rewrites, but I'm not sure if I like the fragility of it.

slinnarsson commented 8 years ago

Just disallow leading and trailing underscores for transcriptomes, datasets and projects. Row and column labels can still have leading and trailing underscores, and that's fine.

JobLeonard commented 8 years ago

Ok, implemented.

The row & column labels are passed via CSV, where I currently only change semicolons so nothing to be changed there.

I'm assuming that regression labels can have leading/trailing underscores too?

slinnarsson commented 8 years ago

Yes, they are the same as row/column labels.

/Sten

Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden +46 8 52 48 75 77 (office) +46 70 399 32 06 (mobile)

On 26 May 2016, at 15:35, Job van der Zwan notifications@github.com<mailto:notifications@github.com> wrote:

Ok, implemented.

The row & column labels are passed via CSV, where I currently only change semicolons so.

I'm assuming that regression labels can have leading/trailing underscores too?

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/21#issuecomment-221872228