plotly / dash-bio

Open-source bioinformatics components for Dash
https://dash-gallery.plotly.host/Portal/?search=Bioinformatics
MIT License
531 stars 192 forks source link

reading in fasta file from dcc.Upload() #529

Closed nyck33 closed 4 years ago

nyck33 commented 4 years ago

It's in base64 so I did the following based on example code for dcc.Upload

 def parse_contents(contents, filename, date):
        content_type, content_string, = contents.split(',')
        print(f'name:{type(filename)}\n{filename}\n')
        print(f'type:{type(content_type)}\n{content_type}\n')
        print(f'string:{type(content_string)}\n{content_string}\n')
        decoded = base64.b64decode(content_string)
        print(f'decoded:{decoded}\n')
        seq1 = SeqIO.parse(decoded, "fasta")
        print(f'seq parsed {seq1}')
        seq_str = str(decoded)
        print(f'seq_str: {seq_str}')
        #split into lines for output as P's
        seq_arr = seq_str.split('\n')
        #replace \n
        seq_arr = [x.replace("\n", " ") for x in seq_arr]
        for line in seq_arr:
            print(line)

I get:

OSError: [Errno 36] File name too long: b'>WP_011143314.1 L-lactate dehydrogenase [Gloeobacter violaceus]\nMQDRLFVSMEHPRALPETDLIKGAIVGAGAVGMAIAYSMLIQNTFDELVLVDIDRRKVEGEVMDLVHGIP\nFVEPSVVRAGTLADCRGVDVVVITAGARQREGETRLSLVQRNVEIFRGLIGEIMEHCPNAILLVVSNPVD\nVMTYVAMKLAGLPPSRVIGSGTVLDTARFRYLLAERLRVDPRSLHAYIIGEHGDSEVPVWSRANVAGAFL\nSEIEPAVGTPDDPAKMFEVFEHVKNAAYEIIERKGATSWAIGL...

Without seqIO, I get a string from decoded but cannot strip() no replace() new lines.

string:<class 'str'>
PldQXzAxMTE0MzMxNC4xIEwtbGFjdGF0ZSBkZWh5ZHJvZ2VuYXNlIFtHbG9lb2JhY3RlciB2aW9sYWNldXNdCk1RRFJMRlZTTUVIUFJBTFBFVERMSUtHQUlWR0FHQVZHTUFJQVlTTUxJUU5URkRFTFZMVkRJRFJSS1ZFR0VWTURMVkhHSVAKRlZFUFNWVlJBR1RMQURDUkdWRFZWVklUQUdBUlFSRUdFVFJMU0xWUVJOVkVJRlJHTElHRUlNRUhDUE5BSUxMVlZTTlBWRApWTVRZVkFNS0xBR0xQUFNSVklHU0dUVkxEVEFSRlJZTExBRVJMUlZEUFJTTEhBWUlJR0VIR0RTRVZQVldTUkFOVkFHQUZMClNFSUVQQVZHVFBERFBBS01GRVZGRUhWS05BQVlFSUlFUktHQVRTV0FJR0xBVFRRSVZSQUlUUk5RTlJWTFBWU1ZMTVNHTEgKR0lFRVZDTEFZUEFWTE5SUUdJRFJMVktGU0xTUEdFRUVRTFFSU0FSVk1SUVRMREdJUUYKCg==

decoded:b'>WP_011143314.1 L-lactate dehydrogenase [Gloeobacter violaceus]\nMQDRLFVSMEHPRALPETDLIKGAIVGAGAVGMAIAYSMLIQNTFDELVLVDIDRRKVEGEVMDLVHGIP\nFVEPSVVRAGTLADCRGVDVVVITAGARQREGETRLSLVQRNVEIFRGLIGEIMEHCPNAILLVVSNPVD\nVMTYVAMKLAGLPPSRVIGSGTVLDTARFRYLLAERLRVDPRSLHAYIIGEHGDSEVPVWSRANVAGAFL\nSEIEPAVGTPDDPAKMFEVFEHVKNAAYEIIERKGATSWAIGLATTQIVRAITRNQNRVLPVSVLMSGLH\nGIEEVCLAYPAVLNRQGIDRLVKFSLSPGEEEQLQRSARVMRQTLDGIQF\n\n'

seq_str: b'>WP_011143314.1 L-lactate dehydrogenase [Gloeobacter violaceus]\nMQDRLFVSMEHPRALPETDLIKGAIVGAGAVGMAIAYSMLIQNTFDELVLVDIDRRKVEGEVMDLVHGIP\nFVEPSVVRAGTLADCRGVDVVVITAGARQREGETRLSLVQRNVEIFRGLIGEIMEHCPNAILLVVSNPVD\nVMTYVAMKLAGLPPSRVIGSGTVLDTARFRYLLAERLRVDPRSLHAYIIGEHGDSEVPVWSRANVAGAFL\nSEIEPAVGTPDDPAKMFEVFEHVKNAAYEIIERKGATSWAIGLATTQIVRAITRNQNRVLPVSVLMSGLH\nGIEEVCLAYPAVLNRQGIDRLVKFSLSPGEEEQLQRSARVMRQTLDGIQF\n\n'
seq_arr
b'>WP_011143314.1 L-lactate dehydrogenase [Gloeobacter violaceus]\nMQDRLFVSMEHPRALPETDLIKGAIVGAGAVGMAIAYSMLIQNTFDELVLVDIDRRKVEGEVMDLVHGIP\nFVEPSVVRAGTLADCRGVDVVVITAGARQREGETRLSLVQRNVEIFRGLIGEIMEHCPNAILLVVSNPVD\nVMTYVAMKLAGLPPSRVIGSGTVLDTARFRYLLAERLRVDPRSLHAYIIGEHGDSEVPVWSRANVAGAFL\nSEIEPAVGTPDDPAKMFEVFEHVKNAAYEIIERKGATSWAIGLATTQIVRAITRNQNRVLPVSVLMSGLH\nGIEEVCLAYPAVLNRQGIDRLVKFSLSPGEEEQLQRSARVMRQTLDGIQF\n\n'
jackparmer commented 4 years ago

Please check out community.plotly.com for usage questions like this. You're more likely to get help over there.