Dataset versus file concept

sven1103 commented 2 years ago

Thanks a lot for the new release 0.6.0 and the new option --print!

I have noticed, that the description of the option states that it lists the datasets:

     --print                print available datasets for the provided samples

What I get is a printout of files, not datasets (example QUK17627B3).

09:07:54.786 [main] INFO  life.qbic.App - Please provide a password for user 'bbbfs01':

09:07:59.237 [main] INFO  life.qbic.model.download.QbicDataDownloader - 1 provided openBIS identifiers have been found: [QUK17627B3]
09:07:59.374 [main] INFO  life.qbic.model.download.QbicDataDownloader - Number of datasets found for identifier QUK17627B3 : 2
09:07:59.374 [main] INFO  life.qbic.model.download.QbicDataDownloader - Files available for download:
09:07:59.488 [main] INFO  life.qbic.model.download.QbicDataDownloader - 2.01 Gb QUK17627B3_tumor.2.fastq.gz
09:07:59.488 [main] INFO  life.qbic.model.download.QbicDataDownloader - 1.94 Gb QUK17627B3_tumor.1.fastq.gz
09:07:59.564 [main] INFO  life.qbic.model.download.QbicDataDownloader - 1.86 kb QUK17627B3_20181227133011_.zip

You can see that the number of datasets is reported to be 2, but the listing shows 3 files.

This is because the composition is:

Dataset 1 ------ 1..n File

Also, when we include the concept of a sample in the ER, it is:

Sample 1 ------ 0..n Dataset 1 ----- 1--n File

So we need to honor this relation and make it transparent to the user. Keep in mind that you can pass any top level sample code and postman fetches recursively the datasets.

I think that JSON is a more suitable notation to display hierarchical structures and should be straight forward to implement.

Best, Sven

sven1103 commented 2 years ago

If the users prefer a flat notation, than a tsv printout can be done, but then you need to include columns for different groups (which is a lot of redundant information). Just a thought.

KochTobi commented 1 year ago

qPostman lists all files grouped by dataset as of version 1.0.0

qbicsoftware / postman-cli

Dataset versus file concept #121