nf-osi / nfportalutils

Utilities for NF Portal project and data management
https://nf-osi.github.io/nfportalutils/
MIT License
2 stars 2 forks source link

safety of bare_syn_id #190

Open allaway opened 6 days ago

allaway commented 6 days ago

https://github.com/nf-osi/nfportalutils/blob/ecb884f6996b4e160dee1dbbc80776c9a1833e45/R/basic_utils.R#L53

Not sure how best to resolve this, but i was trying to process a samplesheet that contains URIs like this: s3://robert-allaway-project-tower-bucket/syn40134517/syn7989838/SL106309_1.fastq.gz

With the current implementation, I get output that looks like this:

Screenshot 2024-09-16 at 6 27 04 PM

Which is because the bare_syn_id function automatically retrieves the first synId from the string. In addition, nf-synapse now has this option: flat: https://github.com/Sage-Bionetworks-Workflows/nf-synapse/tree/main?tab=readme-ov-file#parameters

Which would also break these samplesheets.

For the first issue, I wonder if throwing a warning about multiple syn_ids in the string, and adding a parameter allowing people to select the index of the grep results would be sufficient?

allaway commented 6 days ago

Not sure how to deal with the second. Probably easiest is to just document that files imported to tower with the flat method should use the original samplesheet (with syn:// uris) instead of the "updated" samplesheet