Closed lokeshbio closed 1 year ago
master
branch :x:base
to dev
Hi @lokeshbio,
It looks like this pull-request is has been made against the lokeshbio/ampliseq master
branch.
The master
branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master
are only allowed if they come from the lokeshbio/ampliseq dev
branch.
You do not need to close this PR, you can change the target branch to dev
by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Thanks again for your contribution!
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit fbee8c3
+| ✅ 154 tests passed |+
#| ❔ 1 tests were ignored |#
!| ❗ 3 tests had warnings |!
Thanks, this addition looks interesting! I havent investigated any COI data yet, so I might ask basic questions here.
Is this related to https://github.com/nf-core/ampliseq/issues/450, that seems to discuss also stop codons?
I do think it might be better to make the length thresholds adjustable instead of having them hard coded. Coming to think of it, is it even important to have the length filter? Wouldnt it sufficient to have a stop codon filtering (i.e. contains a stop codon -> remove) enough, because a length filter is already implemented in the pipeline?
Unfortunately some standards are not yet satisfied, such as phyton code (see PythonBlack
), general formatting (prettier
/ EditroConfig
). Also, as far as I see, you introduced a new parameter called --filter_coi_asv
and that needs to go into the documentation in nextflow_schema.json
& the default to nextflow.config
.
_edit: also, you seem to have based your work on the master
branch, instead you are supposed to use the dev
branch. Please consider reading https://nf-co.re/docs/contributing/code_formating and other helpful material in the nf-core documentation https://nf-co.re/docs._
Hi Daniel,
In some ways it could be related #450 https://github.com/nf-core/ampliseq/issues/450, but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request! As, it looks for stop codons from a particular position of the amplicon where the codons are in one reading frame! The reason also why I hardcoded the length-thresholds is because of the same reason that these are the lengths we found that it suits well for these primers. We are in the process of writing a paper discussing all of these results! As there is a length filtering feature already exists anyway! I forgot to add the actual option in the documentation, sorry about that!
Cheers Lokesh Lokeshwaran Manoharan
On Mon, Apr 3, 2023 at 4:24 PM Daniel Straub @.***> wrote:
Thanks, this addition looks interesting! I havent investigated any COI data yet, so I might ask basic questions here.
Is this related to #450 https://github.com/nf-core/ampliseq/issues/450, that seems to discuss also stop codons?
I do think it might be better to make the length thresholds adjustable instead of having them hard coded. Coming to think of it, is it even important to have the length filter? Wouldnt it sufficient to have a stop codon filtering (i.e. contains a stop codon -> remove) enough, because a length filter is already implemented in the pipeline?
Unfortunately some standards are not yet satisfied, such as phyton code (see PythonBlack), general formatting (prettier / EditroConfig). Also, as far as I see, you introduced a new parameter called --filter_coi_asv and that needs to go into the documentation in nextflow_schema.json & the default to nextflow.config.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1494425623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT36Z2PGT5AF53UOVKLDW7LMSDANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>
Thanks for updating the documentation.
but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request!
That is my criticism. The code you added could be relatively simply adjusted to a wider application area, I think. When changing the following, one could filter all coding amplicons:
bin/filt_InsBiom_asv.py
--reading_frame 1
would mean it checks reading frame starting from position 1 (other options might be 2,3). All ASV with stop codons would be removed. Default would be 0 for no filtering at all. This could be simply an offset in your code line sub_list.append(sub_seq[x:x+3])
-> sub_list.append(sub_seq[x+offset:x+3+offset])
where def offset = params.reading_frame -1
--stop_codons "TAA,TAG"
in your case, default would be "TAA,TAG". That is just to maximize customization and avoid any hard-coding.--filter_coi_asv
would be not necessaryNow, to analyze data with the settings you attempt to add here, running the pipeline with the following parameters should do the trick:
--reading_frame 1 --min_len_asv 403 --max_len_asv 418
(--stop_codons
not needed because default is fine)
Please let me know whether that is indeed an option or whether I am wrong.
Hi Daniel,
Thank you very much for being patient with me and explaining me! I totally agree that the feature should be broadly applicable and it does not require too many changes in the code in itself! I will fix the python script and make a new pull request soon!
Cheers Lokesh Lokeshwaran Manoharan
On Tue, Apr 11, 2023 at 11:19 AM Daniel Straub @.***> wrote:
Thanks for updating the documentation.
but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request!
That is my criticism. The code you added could be relatively simply adjusted to a wider application area, I think. When changing the following, one could filter all coding amplicons:
- removing the length filter in you python script bin/filt_InsBiom_asv.py
- add parameter taking the reading frame of the amplicon, e.g. --reading_frame 1 would mean it checks reading frame starting from position 1 (other options might be 2,3). All ASV with stop codons would be removed. Default would be 0 for no filtering at all. This could be simply an offset in your code line sub_list.append(sub_seq[x:x+3]) -> sub_list.append(sub_seq[x+offset:x+3+offset]) where def offset = params.reading_frame -1
- add parameter taking a set of stop codons to look for, e.g. --stop_codons "TAA,TAG" in your case, default would be "TAA,TAG". That is just to maximize customization and avoid any hard-coding.
- --filter_coi_asv would be not necessary
Now, to analyze data with the settings you attempt to add here, running the pipeline with the following parameters should do the trick: --reading_frame 1 --min_len_asv 403 --max_len_asv 418 (--stop_codons not needed because default is fine)
Please let me know whether that is indeed an option or whether I am wrong.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1502974087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT3YL7IDJ6B6T5KA6ISLXAUOYRANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>
Sounds great!
Could you base your work please on branch dev
instead of master
? The master branch contains code of the current release, the dev branch contains the current code which will go into the master branch once a new release is made.
Oh yes! I keep forgetting that! Sorry about that.
Cheers Lokesh
On Tue, 11 Apr 2023, 12:22 Daniel Straub, @.***> wrote:
Sounds great!
Could you base your work please on branch dev instead of master? The master branch contains code of the current release, the dev branch contains the current code which will go into the master branch once a new release is made.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1503067528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT33FD7QT6YGKMOUOQJ3XAUWIDANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>
This pull request contains a filtering step of ASVs that are COI based on their length and also their protein coding sequence. These ASVs are generated by any of the following primers:
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).