immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
297 stars 65 forks source link

Problem in reading metadata #298

Open Mussa1122 opened 1 year ago

Mussa1122 commented 1 year ago

Hi

Thanks for your software I do not know why I am not able to merge my metadata

I should mention that my two Bulk samples are output of micxr and my two sc(single cell) samples are output of TRUST4

> head(metadata)
                                       Sample Platform
1                                       TCR_1     Bulk
2                                       TCR_2     Bulk
3 TRUST_PN0340_0001_GEX_S3_L001_R2_001_report     sc5'
4 TRUST_PN0340_0002_GEX_S3_L001_R2_001_report     sc5'
> list.files()
 [1] "AddSequenceToCDR3File.pl"                        "barcoderep-expand.py"                           
 [3] "barcoderep-filter.py"                            "GetFullLengthAssembly.pl"                       
 [5] "metadata.txt"                                    "README.md"                                      
 [7] "scripts"                                         "TCR_1.txt"                                      
 [9] "TCR_2.txt"                                       "TRUST_PN0340_0001_GEX_S3_L001_R2_001_report.tsv"
[11] "TRUST_PN0340_0002_GEX_S6_L001_R2_001_report.tsv" "trust-barcoderep-to-10X.pl"                     
[13] "trust-cluster.py"                                "trust-stats.py"                                 
> immdata <- repLoad(file.path(getwd()))

== Step 1/3: loading repertoire files... ==

Processing "/data/Continuum/Angel/TRUST4-master/scripts" ...
  -- [1/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/AddSequenceToCDR3File.pl" -- unsupported format, skipping
  -- [2/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/barcoderep-expand.py" --   -- [3/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/barcoderep-filter.py" --   -- [4/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/GetFullLengthAssembly.pl" -- unsupported format, skipping
  -- [5/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/metadata.txt" -- metadata
  -- [6/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/README.md" -- unsupported format, skipping                                                    
  -- [7/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TCR_1.txt" -- mixcr
  -- [8/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TCR_2.txt" -- mixcr                                                                           
  -- [9/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TRUST_PN0340_0001_GEX_S3_L001_R2_001_report.tsv" -- vdjtools                                  
  -- [10/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/TRUST_PN0340_0002_GEX_S6_L001_R2_001_report.tsv" -- vdjtools                                 
  -- [11/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/trust-barcoderep-to-10X.pl" --   -- [12/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/trust-cluster.py" -- unsupported format, skipping
  -- [13/13] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/trust-stats.py" -- unsupported format, skipping
Processing "/data/Continuum/Angel/TRUST4-master/scripts/scripts" ...
  -- [1/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/AddSequenceToCDR3File.pl" -- unsupported format, skipping
  -- [2/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/barcoderep-expand.py" --   -- [3/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/barcoderep-filter.py" --   -- [4/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/GetFullLengthAssembly.pl" -- unsupported format, skipping
  -- [5/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/README.md" -- unsupported format, skipping
  -- [6/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/trust-barcoderep-to-10X.pl" --   -- [7/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/trust-cluster.py" -- unsupported format, skipping
  -- [8/8] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/scripts/trust-stats.py" -- unsupported format, skipping

== Step 2/3: checking metadata files and merging files... ==

Processing "/data/Continuum/Angel/TRUST4-master/scripts" ...
  -- Samples found in the metadata, but not in the folder:
     TRUST_PN0340_0002_GEX_S3_L001_R2_001_report
  Did you correctly specify all the sample names in the metadata file?
  -- Samples found in the folder, but not in the metadata:
     TRUST_PN0340_0002_GEX_S6_L001_R2_001_report
  Did you add all the necessary samples to the metadata file with correct names?
  Creating dummy sample records in the metadata for now...
Processing "/data/Continuum/Angel/TRUST4-master/scripts/scripts" ...
  -- Metadata file not found; creating a dummy metadata...

== Step 3/3: processing paired chain data... ==

Done!

Warning messages:
1: Unknown or uninitialised column: `Sample`. 
2: Unknown or uninitialised column: `Sample`. 

metadata.txt

Any help please?

Thanks a lot in advance

Alexander230 commented 1 year ago

Hi, @Mussa1122!

I'm Aleksandr Popov, a developer of Immunarch package. Thank you for using our software!

It looks like the folder from which you are loading the data (/data/Continuum/Angel/TRUST4-master/scripts) contains not only the data, but also other files (.py, .pl, .md etc). Immunarch doesn't support it; please move all your data and metadata files to a separate directory, and pass the path to that directory as argument to repLoad (instead of file.path(getwd())).

Best regards, Aleksandr

Mussa1122 commented 1 year ago

Sorry I do not why saying not the same samples in the folder are in the metadata

> list.files()
[1] "metadata.txt" "TCR_1.txt"    "TCR_2.txt"    "TRUST1.tsv"   "TRUST2.tsv"  
>
> metadata=read.delim("metadata.txt")
> head(metadata)
Sample Platform
1  TCR_1.txt     Bulk
2  TCR_2.txt     Bulk
3 TRUST1.tsv       sc
4 TRUST2.tsv       sc
> 
> immdata <- repLoad(.path="/data/Continuum/Angel/TRUST4-master/scripts/tcr/")

== Step 1/3: loading repertoire files... ==

  Processing "/data/Continuum/Angel/TRUST4-master/scripts/tcr/" ...
-- [1/5] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/tcr//metadata.txt" -- metadata
-- [2/5] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/tcr//TCR_1.txt" -- mixcr                                                                       
-- [3/5] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/tcr//TCR_2.txt" -- mixcr                                                                       
-- [4/5] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/tcr//TRUST1.tsv" -- vdjtools                                                                   
-- [5/5] Parsing "/data/Continuum/Angel/TRUST4-master/scripts/tcr//TRUST2.tsv" -- vdjtools                                                                   

== Step 2/3: checking metadata files and merging files... ==

  Processing "/data/Continuum/Angel/TRUST4-master/scripts/tcr/" ...
-- Samples found in the metadata, but not in the folder:
  TCR_1.txtTCR_2.txtTRUST1.tsvTRUST2.tsv
Did you correctly specify all the sample names in the metadata file?
  -- Samples found in the folder, but not in the metadata:
  TCR_1TCR_2TRUST1TRUST2
Did you add all the necessary samples to the metadata file with correct names?
  Creating dummy sample records in the metadata for now...

== Step 3/3: processing paired chain data... ==

  Done!

  > 

When visualization likely does not recognize one column of metadata

> exp_vol <- repExplore(immdata$data, .method = "volume")
> p1 <- vis(exp_vol, .by = c("Platform"), .meta = immdata$meta)
Error in (function (x, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95), symbols = if (numeric.x) c(" ",  : 
                                                                                             argument "x" is missing, with no default
                                                                                           > p2 <- vis(exp_vol, .by = c("Sample", "Platform"), .meta = immdata$meta)
                                                                                           Warning message:
                                                                                             Ignoring unknown aesthetics: xmin, xmax, annotations, y_position 
                                                                                           > p1 + p2
                                                                                           > 

Rplot01

Any help please?

Alexander230 commented 1 year ago

Hi, @Mussa1122!

I think, you can try to remove file extensions from sample names in metadata.txt. Name them like TCR_1, not TCR_1.txt, then repLoad function will match the files with your sample names. I will record this bug to fix in one of the future versions of Immunarch, so sample names with extensions will eventually be supported too.

Best regards, Aleksandr

Mussa1122 commented 1 year ago

Sorry I got confused by the results of two functions, please have a look

I have had two healthy PBMCs

I profiled those two samples with conventional bulk RNA-seq, 10x 5' GEX, 10x BCR and 10x TCR

I then run TRUST4 on my fastq files (paired end)

And I finally visualized TRUST4 report.tsv with your software

Why the number of uniqueclonotypes with these two functions are too different please? The first says that by TCR kit we get more unique clonotypes and the other says with bulk RNA-seq we get more unique clonotypes

exp_vol <- repExplore(immdata$data, .method = "volume")

 vis(exp_vol, .by = c("Sample", "platform"), .meta = immdata$meta)

Rplot

exp_vol <- repExplore(immdata$data, .method = "volume")
by_vec <- c("GEX", "GEX", "BCR", "BCR","TCR", "TCR","Bulk", "Bulk")
p <- vis(exp_vol, .by = by_vec)
p

Rplot01

 imm_raref <- repDiversity(immdata$data, "raref", .verbose = F)
> 
> p1 <- vis(imm_raref)
> p2 <- vis(imm_raref, .by = "Sample", .meta = immdata$meta)
> p1+p2
>

Rplot

Any help please