thierrygosselin / stackr

stackr: an R package to run stacks software pipeline
http://thierrygosselin.github.io/stackr/
27 stars 8 forks source link

Example project info file for paired end sequencing with >1 barcode and multiple lanes #27

Closed jarekbryk closed 1 year ago

jarekbryk commented 2 years ago

Hi,

it would be very helpful if you could provide an example project info file as the instructions are a bit unclear about what it should look like if one has multiple plates/lanes in the paired-end scenario. For example: should the file have another column called LANES?

Similarly, it would be very helpful to see how to add multiple barcodes per sample (ie scenario 5 in section 4.1.1 of Stacks manual) - should they be in the same column separated with some delimiter? Or should there be two columns for barcodes? If the latter, what should their names be?

A simple example file would clarify this :-)

This looks like a very exciting way to "tidy up" a typically messy Stacks workflow. Thanks a lot!

thierrygosselin commented 2 years ago

Have you read the Get started section ?

There is a section called project info file ...

jarekbryk commented 2 years ago

Hi Thierry,

yes I did read this section, that's why I started the issue... You write (for the paired end setup):

Columns prerequisites:

BARCODES: the barcode associated with the sample
INDIVIDUALS: a more appealing sample name (more on this below)
FORWARD: the name of the forward sequencing file e.g. H5CY5BBXY_2_25_1.fastq.gz
REVERSE: the name of the reverse sequencing file e.g. H5CY5BBXY_2_25_2.fastq.gz

and on the multiple lanes scenario (in the FAQ section):

Multiple sequencing lanes or chips ?

If you have several Illumina lanes or Ion Torrent chips, just write them down along the sample and barcodes associated with it. On the same file.

but these descriptions don't provide enough details to understand how the file should be organised if one has multiple barcodes per sample or where the information about lanes should go if one has multiple lanes in an experiment. I know I can trial and error this, but it would be very helpful if one could see an example of the project info file for different setups.

konopinski commented 1 year ago

It seems stackr is not capable of handling dual inline tags yet. In line 30 the function imports the project info file but even if you put '\t' as a separator in the first column of the file ("BARCODES") the importing function will add escape before it so it won't be exported as tab in line 132. Other separators won't help because process_radtags accepts only tabs. The solution would be to add another option in the function's arguments, e.g. 'dual.tags' and add another format for the tibble ('ccccc'), or to let users choose their own file with barcodes - with the latter probably easier to code. If the barcodes are imported from external file ("02_project_info/barcodesid(...).txt") maybe there's no sense to require them in project_info file?

thierrygosselin commented 1 year ago

@jarekbryk @konopinski

I share the codes I use, but stackr is really just for my personal use and the collaborators I work with. I test it with very specific stacks flavours depending on the projects I have with data I have I have no interest of testing it with other dataset or spending time to make it work for others I also have no intention of publishing a paper on this because it's totally dependent on stacks.

If you used stacks before and went on to check the google group, you see it's almost a full time job...

I'll clarify all of this on the readme page of the package so that there is no expectations.

Sorry. You can make a repo, use my code, modify my codes, contribute, but that is pretty much it.

Best of luck Thierry

thierrygosselin commented 1 year ago

@jarekbryk @konopinski With my radiator package its completely different