vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
262 stars 53 forks source link

Libraries from MaxQuant ? #480

Open videlc opened 2 years ago

videlc commented 2 years ago

Hey Vadim,

Since it's my first message here, thanks for developping such an amazing software. I'm wondering if there is any (convenient) way to use any MaxQuant output file (e.g. msms.txt) as spectral library ? I've read here and there that there's a way to perform this using Skyline, also maybe diapysef which are not really straightforward and couldn't find any tutorial.

I could write some R script to "convert" any msms.txt to any working speclib but if there's a simpler way, I'd take it.

Thanks, Vivian

vdemichev commented 2 years ago

Hi Vivian,

Yes, MaxQuant msms.txt can be used as a library by DIA-NN, although this is experimental functionality and I recommend to export this to DIA-NN's .tsv format and check if all is good. Also, can just format the list of peptides detected by MaxQuant as FASTA, and make an in silico lib from that. Or can use FragPipe instead of MaxQuant and use the library directly in DIA-NN.

Best, Vadim

videlc commented 2 years ago

Posting here a R workaround to select leading accessions of msms.txt and write a corresponding fasta file for anyone who would be interested.

Note that library-free mode performs slower but better than this solution. Note also that this code may be suboptimal and inelegant.


require(tidyverse)
require(janitor)
require(seqinr)

# read (any) msms.txt and concatenate them
msmsfiles<-list.files(pattern = 'msms.txt',recursive = T)
allmsms<-tibble()

for(file in msmsfiles){

    temp<-read_delim(file,delim = '\t') %>% clean_names()

    allmsms<-bind_rows(allmsms,temp)

}

#selection of leading accession
allprots<-allmsms %>% select(proteins)  %>% unique()
allprots$proteins <- gsub(';.*','',allprots$proteins)

#fasta import as table and annotate accession column
fasta<-read.fasta('path/to/fasta.fasta',
                 seqtype = 'AA',
                 whole.header = T,
                 as.string = T)  %>% as_tibble()  %>% t()

headers<-row.names(fasta)

fasta<-as_tibble(fasta) %>% mutate(header=headers)

#getting uniprot accession number from header
fasta$acc<-gsub('sp\\|','',fasta$header)
fasta$acc<-gsub('tr\\|','',fasta$acc)
fasta$acc<-gsub('\\|.*','',fasta$acc)
fasta$acc<-gsub('CON__','',fasta$acc)

fasta$rown<-1:nrow(fasta)
short_fasta<-tibble()

#selecting proteins that are in allmsms table (from msms.txt)
for(prot in allprots$proteins){

    temp<-fasta %>% filter(acc==prot)

    short_fasta<-bind_rows(short_fasta,temp)

}

#write fasta with only msms.txt leading accession proteins
write.fasta(sequences = short_fasta$V1 %>% as.list() %>% unique(),names = short_fasta$header %>% as.list() %>% unique(),file.out = 'test.fasta')

Vivian