Usage of PR tag is unclear

bifxcore commented 1 year ago

The README says: Use the tag PR:xxx.fa to specify paired protein/RNA.

The code says "Merge MSAs based on taxonomy ID", which I believe makes sense in my case (binding tRNA to synthetase dimer, all from same organism source) but I cannot figure out how to specify the PR flag in the input.

run_RF2NA.sh does not test if [ $type = 'PR' ]

What goes in that 'PR:xxx.fa' fasta file? Concatenation of protein and RNA sequence? What's the sequence separator?

fdimaio commented 1 year ago

Hello, good question.

There are two parts to the protocol, the setup (via run_RF2NA.sh) and the model prediction (via predict.py).

The 'PR:' notation is only used as an input in model prediction (via predict.py).

To run the full pipeline (via run_RF2NA.sh) with a paired prediction, use the P:xxx.fa R:xxx.fa notation. The block:

############################################################
# Merge MSAs based on taxonomy ID
############################################################
if [ $nP -eq 1 ] && [ $nD -eq 0 ] && [ $nR -eq 1 ]
then
    echo "Creating joint Protein/RNA MSA"
    echo " -> Running command: $PIPEDIR/input_prep/make_rna_msa.sh $seqfile $WDIR $tag $CPU $MEM"
    $PIPEDIR/input_prep/make_pMSAs_prot_RNA.py $WDIR/$lastP.msa0.a3m $WDIR/$lastR.afa $WDIR/$lastP.$lastR.a3m &> /dev/null 
    argstring="PR:$WDIR/$lastP.$lastR.a3m:$WDIR/$lastP.hhr:$WDIR/$lastP.atab"
fi

will make the joint MSA.

PKUfjh commented 9 months ago

It should be added that if you do not specify the "P" prefix in front of the input fasta file, it will be recognized as "proteins".

uw-ipd / RoseTTAFold2NA

Usage of PR tag is unclear #50