phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

failed in merge_samples.py #33

Closed JingwenD closed 2 years ago

JingwenD commented 2 years ago

Hi,

I am failing to get merge object using merge_samples.py.

I get an error:

Traceback (most recent call last):
  File "~/CoNGA/conga/scripts/merge_samples.py", line 83, in <module>
    assert exists(bcmap_file)
AssertionError

My code is:

python3 ~/CoNGA/conga/scripts/merge_samples.py \
--samples sampla_list.txt \
--output_clones_file merged_pbmc_clones.tsv \
--output_gex_data merged_pbmc_gex.h5ad \
--organism human 

Thanks for the development of CoNGA ! Looking forward to your reply.

Best,

Jingwen

phbradley commented 2 years ago

Hi Jingwen, Thanks for trying conga! It looks like the script is having trouble finding the barcode mapping file, which connects the clone_id's in the clones file to the barcodes in the GEX data. Each line of the samples file should correspond to a dataset that can be run on its own through conga. So it should have a clones file, a GEX file, and assuming the clones file was made by the setup_10x_for_conga.py script, there should be a file with a name like .barcode_mapping.tsv. That's the file it seems to be having trouble finding. Can you double-check that for each clones file in the samples file, there is such a barcode mapping file?

JingwenD commented 2 years ago

Hi Philip,

Thanks for you reply. Indeed, we made some mistake when making the sample list table. We finished the merging after fixing this issue.

But we got into the other issue when running the run_conga.py:

Python 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
24 logical CPU cores, x86_64
-----
Session information updated at 2021-11-28 14:15

reading: /hpc/dla_lti/jdeng/scRNASeq/PBMC/merged_pbmc_gex.h5ad of type h5ad
/hpc/dla_lti/jdeng/sandbox/gitrepos/conga/conga/preprocess.py:221: DeprecationWarning: Use is_view instead of isview, isview will be removed in the future.
  if adata.isview: # ran into trouble with AnnData views vs copies
total barcodes: 91926 (91926, 36601)
Traceback (most recent call last):
  File "gitrepos/conga/scripts/run_conga.py", line 300, in <module>
    suffix_for_non_gene_features = args.suffix_for_non_gene_features,
  File "/hpc/dla_lti/jdeng/sandbox/gitrepos/conga/conga/preprocess.py", line 299, in read_dataset
    assert exists(kpca_file)
AssertionError

It seems fail to load the kpca information.

Shoud I need to remove the --no_kpca flag when running setup_10x_for_conga.py?

phbradley commented 2 years ago

Hi Jingwen, Glad the merging worked! You could either add --no_kpca when running run_conga.py or remove the --no_kpca flag from merge_samples.py (ie, make sure that the kpca is run during the merging step). If the merged set is big (>10K clonotypes), the kpca on the merged dataset can take a long time/use lots of memory. So running run_conga.py with --no_kpca is the way to go for really big datasets.

Let me know if that's not clear. Take care, Phil

JingwenD commented 2 years ago

Hi Philip,

It is nice of you for the prompt reply. We solved the problem by adding the --no_kpca when running run_conga.py. But it is pity that it ends with the error that is about the library issue in our HPC environment.

running 1D UMAP gex
ran louvain clustering: louvain_gex
preprocess.cluster_and_tsne_and_umap:: X_pca_tcr is not present in adata.obsm; using exact tcrdist nbrs for umap and clustering
util.run_command: cmd= /hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors -f tmp_tcrdists551_tcrs.tsv -n 10 -d /hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/db/tcrdist_info_human.txt -o tmp_tcrdists551_calc_tcrdist
/hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors)
/hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors)
/hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /hpc/dla_lti/jdeng/sandbox/gitrepos/conga/tcrdist_cpp/bin/find_neighbors)
find_neighbors failed: False False

I tried to re-run it in local computer. But the computer could not handle such big dataset.

Sorry to disturb you in weekend.

Bset,

Jingwen

phbradley commented 2 years ago

Hi Jingwen, I'm sorry! That's really too bad. I wonder whether compiling the C++ code within the cluster environment might help with that... Good luck! Take care, Phil

JingwenD commented 2 years ago

Hi Philip,

We made it done in another HPC environment, for TCR, and BCR as well. Again, thanks for the development of CoNGA. It is pretty cool!

Best, Jingwen

phbradley commented 2 years ago

OK, great! Glad it worked! Take care, PHil


From: JingwenD @.> Sent: Tuesday, November 30, 2021 9:35 AM To: phbradley/conga @.> Cc: Bradley PhD, Phil @.>; Comment @.> Subject: Re: [phbradley/conga] failed in merge_samples.py (Issue #33)

Hi Philip,

We made it done in another HPC environment, for TCR, and BCR as well. Again, thanks for the development of CoNGA. It is pretty cool!

Best, Jingwen

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_phbradley_conga_issues_33-23issuecomment-2D982863456&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=mms-xhTgMpgzKFYbF76AVS2PdMcMf2URHTuXLrgWBJw3W74bmDMZyjcbtnEkujLn&s=OX8Zy41W6GsDawG0Sm1vI9MciZ6rsrgc9EU4zbNaA5Y&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABBNCH34SE7S2YWUYVZR5NLUOUDP7ANCNFSM5I4UC6PA&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=mms-xhTgMpgzKFYbF76AVS2PdMcMf2URHTuXLrgWBJw3W74bmDMZyjcbtnEkujLn&s=Sg6ErKT8PPoGa8zZd7Q5-wBG5MbejSGs9v3cOhr-DdM&e=. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=mms-xhTgMpgzKFYbF76AVS2PdMcMf2URHTuXLrgWBJw3W74bmDMZyjcbtnEkujLn&s=-Yyib0DYlxQTBe70DfaTw-79Dqg2q6iN3dCn54g5WHk&e= or Androidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=OoOdU4GyDM4g0P0UJHufcJpPOVmpY9zfZYFqEZ7QEzw&m=mms-xhTgMpgzKFYbF76AVS2PdMcMf2URHTuXLrgWBJw3W74bmDMZyjcbtnEkujLn&s=P-LL3VsV7dBaQeXsD4chtahpnsM4HBs49uk89BXGrU8&e=.