s175573 / GIANA

Ultrafast TCR clustering algorithm based on geometric isometry
Other
55 stars 30 forks source link

Clustering multiple inputs using the -d flag does not aggregate data from each input file #4

Closed psimps21 closed 1 year ago

psimps21 commented 1 year ago

Forgive me if I am misusing the -d flag.

I am attempting to cluster sequences from multiple input files and it seems that the output file only contains clusters from the last processed input file in the input directory. To get the functionality of clustering sequences from all files in the input directory I had to manually concatenate the files and then use the concatenated file as input with the -f flag.

s175573 commented 1 year ago

Can you provide the full command of running GIANA?

Thanks, Bo

From: psimps21 @.> Date: Wednesday, July 13, 2022 at 1:28 PM To: s175573/GIANA @.> Cc: Subscribed @.***> Subject: [s175573/GIANA] Clustering multiple inputs using the -d flag does not aggregate data from each input file (Issue #4) EXTERNAL MAIL

Forgive me if I am misusing the -d flag.

I am attempting to cluster sequences from multiple input files and it seems that the output file only contains clusters from the last processed input file in the input directory. To get the functionality of clustering sequences from all files in the input directory I had to manually concatenate the files and then use the concatenated file as input with the -f flag.

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQC2ACXX2PHUGJYHYWPLVT4DFXANCNFSM53PUZ6KQ. You are receiving this because you are subscribed to this thread.Message ID: @.***> CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.


UT Southwestern

Medical Center

The future of medicine, today.

psimps21 commented 1 year ago

The command I ran was of the form $ python GIANA4.py -d ~/example/dir/foo -O ~/.example/outfile.tsv where '~/.example/dir/foo' contained multiple files with different input data.

I noticed that in the output file generated from this command, the clusters only included data from the most recently processed file as indicated by the command line output 'Processing file name' and the heading of the output file.

Additionally, I observed different clustering results when I ran the above command, compared to when I concatenated all the files in '~/example/dir/foo' to a temporary file (temp_input.tsv) then ran $ python GIANA4.py -f ~/temp_input.csv -O ~/.example/outfile.tsv

I was under the impression that these two commands would have the same output

s175573 commented 1 year ago

When you input is a directory, GIANA will process each file and output to another directory. Please change your -O option accordingly.

Thanks, Bo

From: psimps21 @.> Date: Wednesday, July 13, 2022 at 3:11 PM To: s175573/GIANA @.> Cc: Bo Li @.>, Comment @.> Subject: Re: [s175573/GIANA] Clustering multiple inputs using the -d flag does not aggregate data from each input file (Issue #4) EXTERNAL MAIL

The command I ran was of the form $ python GIANA4.py -d ~/example/dir/foo -O ~/.example/outfile.tsv where '~/.example/dir/foo' contained multiple files with different input data.

I observed that in the output file generated from this command, the clusters only included data from the most recently processed file as indicated by the command line output 'Processing file name' and the heading of the output file.

— Reply to this email directly, view it on GitHubhttps://github.com/s175573/GIANA/issues/4#issuecomment-1183632007, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKWYQCYGCEYPY5TQ4WRYDHTVT4PGTANCNFSM53PUZ6KQ. You are receiving this because you commented.Message ID: @.***> CAUTION: This email originated from outside UTSW. Please be cautious of links or attachments, and validate the sender's email address before replying.


UT Southwestern

Medical Center

The future of medicine, today.

psimps21 commented 1 year ago

Thank you for the clarification!