Closed cbird808 closed 2 months ago
This is a perfectly reasonable request. I think there was a reason I didn't do it this way in the first place, but I can't think of why that was. It may have had partially to do with the fact that inputs could be any of single or multiple forward and/or reverse reads, depending on the (de)multiplexing strateg(ies, see e.g. #82 ) used for the sequencing run.
I firmly agree that users shouldn't have to rename their files to match internal expectations of the pipeline.
I'd welcome the pull request, but any implementation will need to allow the multiple input scenarios.
Cool. I'm speaking from experience here. My intro to bash was forking ddocent 10 years ago. After I gained experience, I realized one original sin of that software is having an inflexible file naming convention and not following unix conventions. It causes issues to this day.
Here's my thoughts on how to make the changes:
FASTQ=$(basename "$reads" | sed -E 's/.*\.([^.]*(\.gz)?)$/\1/')
, or equivalent for $fwd $rev
fastq
will need to be replaced with the variable containing the file extensionAny thoughts on this?
From: mykle hoban @.>
Sent: Thursday, August 15, 2024 4:24 PM
To: mhoban/rainbow_bridge @.>
Cc: Bird, Chris @.>; Author @.>
Subject: Re: [mhoban/rainbow_bridge] Feature Request: Employ Standard Unix Input Convention for --reads
(Issue #81)
This is a perfectly reasonable request. I think there was a reason I didn't do it this way in the first place, but I can't think of why that was. It may have had partially to do with the fact that inputs could be any of single or multiple forward and/or reverse reads, depending on the (de)multiplexing strateg(ies, see e.g. #82https://github.com/mhoban/rainbow_bridge/issues/82 ) used for the sequencing run.
I firmly agree that users shouldn't have to rename their files to match internal expectations of the pipeline.
I'd welcome the pull request, but any implementation will need to allow the multiple input scenarios.
- Reply to this email directly, view it on GitHubhttps://github.com/mhoban/rainbow_bridge/issues/81#issuecomment-2292284442, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBV4SYM7RZ7YDRLXZULL2TZRUL7NAVCNFSM6AAAAABMS4QVX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJSGI4DINBUGI. You are receiving this because you authored the thread.Message ID: @.**@.>>
Hi @mhoban ,
I've added glob handling functionality for --reads
, --fwd
, and --rev
while maintaining the original functionality described in the manual. Use the attached files in place of the orig (brief README.md
included). I've tested with the test_paired_demuxed.sh
. By my reading of the code, the other tests shouldn't be affected.
As is, I think it will handle any combination of globs from --fwd
and --rev
. It will also handle globs from -reads
that follow the input rules for a glob passed to Channel.fromFilePairs
.
If I were to make one primary change to what I did, it would be to take the verbose NextFlow
code used create a glob that's accepted by Channel.fromFilePairs
from params.fwd
and params.rev
and rewrite it in pure groovy
so it can be a helper. Because i was not familiar with nextflow
, I learned the hard way that making the present nextflow
code a process is a no-go because a channel, which is output by a process, cannot be converted back to a string, which is the only format accepted by Channel.fromFilePairs
.
That being said, the core premise of the code I added, identifying the characters that are(nt) in common between strings, could be repurposed and modified to handle any --reads
glob, as well as --fwd
and --rev
dirs without --reads
.
Let me know how it works for you. ceb_update_1.zip
Thanks for this! I'm in the process of working on some other updates but I will take a look at this and merge in your modifications once those are implemented. I appreciate that you have retained the existing functionality, since things were largely designed as they were for ease of use by less unix-savvy users (viz. grad students in the tobo lab)
Sounds good. Let me know if there are any unintended consequences that I missed.
To allow for any file extension, not just fastq, I think the most efficient solution is to extract it when you extract the sample ID and add it to the reads channel. Then update the processes that you pass reads to and replace fastq with $extension
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: mykle hoban @.>
Sent: Wednesday, August 21, 2024 2:08:56 AM
To: mhoban/rainbow_bridge @.>
Cc: Bird, Chris @.>; Author @.>
Subject: Re: [mhoban/rainbow_bridge] Feature Request: Employ Standard Unix Input Convention for --reads
(Issue #81)
Thanks for this! I'm in the process of working on some other updates but I will take a look at this and merge in your modifications once those are implemented. I appreciate that you have retained the existing functionality, since things were largely designed as they were for ease of use by less unix-savvy users (viz. grad students in the tobo lab)
— Reply to this email directly, view it on GitHubhttps://github.com/mhoban/rainbow_bridge/issues/81#issuecomment-2301296655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBV4S73QCT2MIGKO4GA3TTZSQ4IRAVCNFSM6AAAAABMS4QVX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBRGI4TMNRVGU. You are receiving this because you authored the thread.Message ID: @.***>
Thanks for you contribution, @cbird808! This is now all implemented in the branch 81-command-line and will soon be merged into main. I've also updated all the documentation. Major code changes can be found in 522ae01 and doc updates are in cbbf522.
I changed some stuff from your code but the functionality is all more or less the same. The pipeline should now support arbitrary file extensions through your updates, and it has been updated to look for '*.f*q\' if doing things the old way (so at least .fq, .fastq, .fq.gz, and .fastq.gz will work).
I'll also update the test repository in a branch of the same name (but it will probably be merged soon).
*notably your common-prefix code. Your implementation was very clever, but there was an issue whereby indexOf wasn't guaranteed to match the running index in the takeWhile closure. I updated it to use a simple for-loop and it resolved the problems I was having.
Great! Glad to contribute. In my experience, regardless of skill level, rainbow_bridge will be more widely adopted, and you'll receive fewer inquiries on how to use it, if at least the inputs work like most other packages.
From: mykle hoban @.>
Sent: Friday, August 23, 2024 9:30 PM
To: mhoban/rainbow_bridge @.>
Cc: Bird, Chris @.>; Mention @.>
Subject: Re: [mhoban/rainbow_bridge] Feature Request: Employ Standard Unix Input Convention for --reads
(Issue #81)
Thanks for you contribution, @cbird808https://github.com/cbird808! This is now all implemented in the branch 81-command-linehttps://github.com/mhoban/rainbow_bridge/tree/81-command-line and will soon be merged into main. I've also updated all the documentation. Major code changes can be found in 522ae01https://github.com/mhoban/rainbow_bridge/commit/522ae0172a4076d1a7a5d0834b88fb26e7247a37 and doc updates are in cbbf522https://github.com/mhoban/rainbow_bridge/commit/cbbf522d66c6f42526112338492bd5774405fee8.
I changed some stuff from your code but the functionality is all more or less the same. The pipeline should now support arbitrary file extensions through your updates, and it has been updated to look for '.fq' if doing things the old way (so at least .fq, .fastq, .fq.gz, and .fastq.gz will work).
I'll also update the test repository in a branch of the same name (but it will probably be merged soon).
*notably your common-prefix code. Your implementation was very clever, but there was an issue whereby indexOf wasn't guaranteed to match the running index in the takeWhile closure. I updated it to use a simple for-loop and it resolved the problems I was having.
- Reply to this email directly, view it on GitHubhttps://github.com/mhoban/rainbow_bridge/issues/81#issuecomment-2308001361, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBV4S3ENIYA5OZWPIWLAADZS7V3LAVCNFSM6AAAAABMS4QVX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGAYDCMZWGE. You are receiving this because you were mentioned.Message ID: @.**@.>>
All updates have been merged as of e557b4c395ddfe5f4e81a7b977b92e3114933fbd. Closing for now, but feel free to reopen if you encounter further issues.
Hi, great package!
I have a request, or I'd be happy to do the work and send you pull req if you'd implement these.
In the unix philosophy, every command accepts input file(s) rather than directories. Following this, when calling
--paired
, I suggest making--reads
,--fwd
, and--rev
accept a pattern (glob) to match file names rather than a directory name, and you hardcoding the file path and naming convention. It's trivial to change this and would also make the behavior of--reads
match not only expectation based on all unix commands, but also its behavior when--single
is set. It would also allow you to divorce from locking the end user into havingfastq
in the file name.fq
is also common.With the previous request implemented, you can extract the FASTQ file extension from the argument given for
--reads
,--fwd
, and--rev
With these changes, the end users are freed from having to rename their files to match the expectations of
rainbow_bridge
.