nsalomonis / altanalyze

AltAnalyze is a multi-functional and easy-to-use software package for automated single-cell and bulk gene and splicing analyses. Easy-to-use precompiled graphical user-interface versions available from our website.
http://www.altanalyze.org
Apache License 2.0
99 stars 30 forks source link

How to enable cell type annotation like previous AltAnalyze Version. #45

Open ewijaya opened 4 years ago

ewijaya commented 4 years ago

I have the following data downloadable here.

Now, I'm using the most recent version of AltAnalyze.

However when I tried the following script:

ALTANALYZE=/home/ubuntu/storage2/Tools/altanalyze/AltAnalyze.py
/home/ubuntu/anaconda2/bin/python $ALTANALYZE \
    --runICGS yes \
    --expdir test_outdir  \
    --platform RNASeq \
    --species Mm \
    --column_method hopach --rho 0.4 \
    --ExpressionCutoff 4\
    --FoldDiff 3  \
    --SamplesDiffering 3\
    --excludeCellCycle conservative

I cannot get this kind of plot where the cell type is assigned on the left. Like the previous version of AltAnalyze.

IMG_20200605_130618

I have removed the old version and don't know anymore which the previous version can create that. Please advice how can I go about it.

nsalomonis commented 4 years ago

Hi Edward, As you note, when running ICGS (currently version 2), with a command like you specified, the Guide3 results in the ICGS folder and final NMF-defined clusters (marker gene visualized with typically many more clusters) will also have cell-type predictions. Indeed, these are much better in the current version in which there are marker genes for thousands of cell-type specific signatures. First, I would confirm that you are getting the ICGS-NMF folder which provide the primary results for ICGS2. the command you are using is fine, but is typically too stringent for large droplet sequencing experiments. For example:

python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix /Users/saljh8/DemoData/mouse.h5 --output /Users/saljh8/DemoData/ --runICGS yes --expname test

image

image

The more verbose version of this command (displaying default options) is:

python AltAnalyze.py --platform RNASeq --species Mm --restrictBy protein_coding --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix "/Users/saljh8/DemoData/mouse.h5" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test --downsample 2500 --column_method hopach --column_metric cosine --rho 0.2 --ExpressionCutoff 1 --FoldDiff 4 --SamplesDiffering 4 --restrictBy protein_coding --numVarGenes 500 --numGenesExp 500

ICGS2 applies a dynamic correlation cutoff which begins at 0.2 and increases by 0.1 if > 5000 correlated variable genes are obtained.

if you had a tab-delimited file with counts you would add: --dataFormat counts if you want to force a specific number of target clusters: --k 23

If the enriched blue terms do not display in these results (Guide3 in ICGS or FinalMarkerHeatmap in ICGS-NMF), I would assume there was an issue with downloading the GO-Elite database which can be found in the software AltDatabase folder under EnsMart72/goelite/Hs/gene-mapp/Ensembl-BioMarkers.txt

If present, you can try to add these cell-type enrichment results by finding the text file corresponding to the heatmap of interest (e.g., ICGS-NMF/FinalMarkerHeatmap.txt) and supplying the --clusterGOElite BioMarkers option with the hierarchical clustering command:

python AltAnalyze.py --image hierarchical --platform RNASeq --species Mm --display False --input "/Users/saljh8/DemoData/ICGS-NMF/FinalMarkerHeatmap.txt" --contrast 5 --color_gradient yellow_black_blue --column_method None --row_method None --column_metric cosine --row_metric correlation --normalization median --clusterGOElite BioMarkers

This uses the prior clustering rather than re-clustering (replace None with hopach to re-cluster). You can see what the print out is which should indicate cell-type enrichments or produce a specific error if something is missing.

ewijaya commented 4 years ago

Hi Nathan,

Thank you so much for your prompt response. Can you advise the exact command line I can use for the attached TSV matrix file as input? The TSV file downloadable here.

Using my initial command line, I looked at ICGS-NMF subdirectory, but it only contains one file FinalGroups.txt.

Thanks and I hope to hear from you again.

E. P.S. I can't find the example DemoData/ICGS-NMF/FinalMarkerHeatmap.txt in your github.

nsalomonis commented 4 years ago

Hi Edward, I used the example path as a local path on my machine, but an example path in the GitHub (which is build with an older version of ICGS2 without as nice graphics) is: GitHub/altanalyze/DemoData/ICGS/10xGenomics/Mm-e14.5_Kidney-GSE104396/precomputed_results/ICGS-NMF

If your output contains FinalGroups.txt, you can post any errors in the log file (should be closer to the end) that is produced by AltAnalyze (designed output directory: AltAnalyze_timestamp.log).

The file you contained is formatted properly, but the extension should just be changed to ".txt". For example:

python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --expdir "/Users/saljh8/DemoData/matrix.txt" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test

ewijaya commented 4 years ago

Hi Nathan,

Thank you for your reply. I tried this following command:

/home/ubuntu/storage2/Tools/altanalyze/AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --runICGS yes --expdir /home/ubuntu/storage2/tmp/test_altanalyze/output/rnaseq_altanalyze_fc2/ExpressionInput/result.txt --output /home/ubuntu/storage2/tmp/test_altanalyze/output --rho 0.4 --column_method None --expname rnaseq_altanalyze_fc2 --ExpressionCutoff 1 --FoldDiff 2 --SamplesDiffering 3

The input file result.txt can be downloaded here and the log file here.

I still cannot produce plot with cell type assignment in ICGS-NMF directory.

As you will notice in the log file. There are some errors. I'm not sure if it's
caused by my data or real bug in the code:

 File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/ICGS_NMF.py", line 1049, in CompleteICGSWorkflow
    NMFinput,Rank=NMF_Analysis.FilterGuideGeneFile(Guidefile,Guidefile_block,processedInputExpFile,iteration,platform,uniqueIDs,symbolIDs)
  File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/NMF_Analysis.py", line 114, in FilterGuideGeneFile
    rank_Count=int(q[n-1])
ValueError: invalid literal for int() with base 10: 'NA'

Thanks and I hope to hear from you again. E.