tseemann / nullarbor

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
GNU General Public License v2.0
134 stars 37 forks source link

Question about prefilling contigs #259

Open cmkobel opened 4 years ago

cmkobel commented 4 years ago

It is described in the documentation that one can reuse contigs across nullarbor runs by using the prefill option, such that {ID} is substituted with the ID of each sample.

My question is then: Is it supported and predictable to also use asterisks in these, to glob paths for contigs, if the path is more varying?

example:

prefill:
        contigs.fa: /seq/*/{ID}/contigs.fa

instead of:

prefill:
        contigs.fa: /seq/{ID}/contigs.fa

My motivation is that IDs may be overlapping, thus putting all assemblies in the same folder might be dangerous in the long run.

Best, Carl

tseemann commented 4 years ago

Currently it does not support globbing, but it's a good idea. I was considering making prefill: a list of paths too, instead of a single one. What would you want to happen if there was > 1 match to your pattern?

cmkobel commented 4 years ago

What would you want to happen if there was > 1 match to your pattern?

Inform and exit.

tseemann commented 4 years ago

how many folders do you expect /*/ will match in your case?

cmkobel commented 4 years ago

That is a very specific question. Let's say a hundred?

tseemann commented 4 years ago

I'm just worried how slow that glob will be for my less fortunate users (and me) stuck on NFS :)

cmkobel commented 4 years ago

I see. Never thought of that.

I guess, if i keep my IDs unique anyways, its not a problem to have all of them in the same folder.

tseemann commented 4 years ago

I'll still leave this as an enhancement - i'm not against adding it. We're all moving to SSD/NVME with gobs of IOPS soon anyway right? :)