Closed psimps21 closed 2 years ago
Can you provide the full command of running GIANA?
Thanks, Bo
From: psimps21 @.> Date: Wednesday, July 13, 2022 at 1:28 PM To: s175573/GIANA @.> Cc: Subscribed @.***> Subject: [s175573/GIANA] Clustering multiple inputs using the -d flag does not aggregate data from each input file (Issue #4) EXTERNAL MAIL
Forgive me if I am misusing the -d flag.
I am attempting to cluster sequences from multiple input files and it seems that the output file only contains clusters from the last processed input file in the input directory. To get the functionality of clustering sequences from all files in the input directory I had to manually concatenate the files and then use the concatenated file as input with the -f flag.
— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC2ACXX2PHUGJYHYWPLVT4DFXANCNFSM53PUZ6KQ. You are receiving this because you are subscribed to this thread.Message ID: @.***> CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.
UT Southwestern
Medical Center
The future of medicine, today.
The command I ran was of the form
$ python GIANA4.py -d ~/example/dir/foo -O ~/.example/outfile.tsv
where '~/.example/dir/foo' contained multiple files with different input data.
I noticed that in the output file generated from this command, the clusters only included data from the most recently processed file as indicated by the command line output 'Processing file name' and the heading of the output file.
Additionally, I observed different clustering results when I ran the above command, compared to when I concatenated all the files in '~/example/dir/foo' to a temporary file (temp_input.tsv) then ran
$ python GIANA4.py -f ~/temp_input.csv -O ~/.example/outfile.tsv
I was under the impression that these two commands would have the same output
When you input is a directory, GIANA will process each file and output to another directory. Please change your -O option accordingly.
Thanks, Bo
From: psimps21 @.> Date: Wednesday, July 13, 2022 at 3:11 PM To: s175573/GIANA @.> Cc: Bo Li @.>, Comment @.> Subject: Re: [s175573/GIANA] Clustering multiple inputs using the -d flag does not aggregate data from each input file (Issue #4) EXTERNAL MAIL
The command I ran was of the form $ python GIANA4.py -d ~/example/dir/foo -O ~/.example/outfile.tsv where '~/.example/dir/foo' contained multiple files with different input data.
I observed that in the output file generated from this command, the clusters only included data from the most recently processed file as indicated by the command line output 'Processing file name' and the heading of the output file.
— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/4#issuecomment-1183632007, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQCYGCEYPY5TQ4WRYDHTVT4PGTANCNFSM53PUZ6KQ. You are receiving this because you commented.Message ID: @.***> CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.
UT Southwestern
Medical Center
The future of medicine, today.
Thank you for the clarification!
Forgive me if I am misusing the -d flag.
I am attempting to cluster sequences from multiple input files and it seems that the output file only contains clusters from the last processed input file in the input directory. To get the functionality of clustering sequences from all files in the input directory I had to manually concatenate the files and then use the concatenated file as input with the -f flag.