Closed kfuku52 closed 1 year ago
I'm on it!
Turns out we already implemented that the mapping rate is being obtained from a .json file from the kallisto output. We just never removed the calls to the updated_metadata functions, so the extra files were still being produced. I confirmed this by commenting all those calls out and running the complete pipeline from start to finish on a handful of test samples and amalgkit worked just fine.
During curate there is still a metadata.tsv
file being created in the curate
folder, which contains the mapping rates and is used during the execution of the R script.
I would just completely remove any calls to the updated_metadata
related functions. Shall I remove the functions as well?
Thank you. Could you remove the functions as well? Also, the mapping rates are summarized only in curate
and it may not be convenient because the final step of some use cases may be just merge
or cstmm
, not reaching curate
. Would it be possible to make merge
produces merge/metadata.tsv
that contains the mapping rates just like curate/metadata.tsv
?
Any point after quant should be quite alright. Should I leave get_mapping_rate()
still as a part of curate
(as a backup), or should I move this to merge completely?
get_mapping_rate()
should still be there in curate
because we want to use curate/metadata.tsv
without merging it with merge/metadata.tsv
if the project reaches curate
.
The update was very straight forward. I was hoping to push the SVA alternatives update as well, but I need to work out some kinks still. For now, here's the update: https://github.com/kfuku52/amalgkit/commit/b8d2944ac1f051caf5ee5ce8fc83c89927e4faf0
curate
module, since this one creates the metadata.tsv sheet containing the mapping rate information.Something I've noticed while testing is the following problem with this:
When re-using the same project folder, but different input data, curate
will cause problems, because it specifically never overwrites any metadata.tsv. So it will try to use an old metadata.tsv, even though the expression data may not line up with its contents. The current solution is to either make a new project folder every time, or manually delete the metadata.tsv from the curate
folder.
I see two quick solutions to this:
I prefer the --redo
option. I have managed similar problems by date before, but problems are more likely to occur when incorporating such date-based output files into a pipeline.
Added --overwrite_metadata
or -o
in the last push. I decided not to use --redo
, since --redo
usually refers to all files involved in a rerun.
https://github.com/kfuku52/amalgkit/commit/5104adf34fee3e26395633e62dfdd8e2f7625332
Thanks. Please close the issue if everything is done.
amalgkit merge
fails due to a missing argument (--overwrite_metadata
) when merge/metadata.tsv exists. I added --overwrite_metadata
to merge
in the last commit ( https://github.com/kfuku52/amalgkit/commit/d138af4d353c4f5838eef539edd3ecf5ca7b0a73 ), but is there any reason not to overwrite merge/metadata.tsv despite merge/GENUS_SPECIES is newly generated every time? Also, the option name --overwrite_metadata
is confusing because it implies the update of the file specified by --metadata
, but not.
Traceback (most recent call last):
File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/amalgkit", line 381, in <module>
args.handler(args)
File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/amalgkit", line 77, in command_merge
merge_main(args)
File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/merge.py", line 52, in merge_main
write_updated_metadata(metadata, os.path.join(args.out_dir, 'merge', 'metadata.tsv'), args)
File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/curate.py", line 21, in write_updated_metadata
if not args.overwrite_metadata:
AttributeError: 'Namespace' object has no attribute 'overwrite_metadata'
I believe this part is no longer necessary. Please remove it if correct. @Hego-CCTB I am still waiting for your response to the above problem.
amalgkit merge
fails due to a missing argument (--overwrite_metadata
) when merge/metadata.tsv exists. I added--overwrite_metadata
tomerge
in the last commit ( d138af4 ), but is there any reason not to overwrite merge/metadata.tsv despite merge/GENUS_SPECIES is newly generated every time? Also, the option name--overwrite_metadata
is confusing because it implies the update of the file specified by--metadata
, but not.Traceback (most recent call last): File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/amalgkit", line 381, in <module> args.handler(args) File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/amalgkit", line 77, in command_merge merge_main(args) File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/merge.py", line 52, in merge_main write_updated_metadata(metadata, os.path.join(args.out_dir, 'merge', 'metadata.tsv'), args) File "/Users/kf/Dropbox/repos/amalgkit/amalgkit/curate.py", line 21, in write_updated_metadata if not args.overwrite_metadata: AttributeError: 'Namespace' object has no attribute 'overwrite_metadata'
This should absolutely not happen as this is an argument exclusively used in curate
. Must have slipped in when I was simplifying the subparser. This will be removed.
I believe this part is no longer necessary. Please remove it if correct. @Hego-CCTB I am still waiting for your response to the above problem.
Yup, we don't need this anymore!
ab1ff5d174515f8468cb0d268047847a0ab60230 and https://github.com/kfuku52/amalgkit/commit/418632b70ad4e6e7d0e37cfb6cd38a878bb7eca7
Mapping rate should be obtainable from a kallisto output file, so
updated_metadata
is no longer necessary as we discussed a long time ago. @Hego-CCTB Could you work on that? It would be great if you could apply this update in 2 weeks or so. If you need more time, or if you don't have time in the coming weeks, please let me know.