mittinatten / freesasa

C-library for calculating Solvent Accessible Surface Areas
http://freesasa.github.io/
MIT License
103 stars 37 forks source link

Trying to use freesasa on each file in a folder of 3,311 .pdb1 files, works individually but shows reading error when using os.listdir. #42

Closed emroberts95 closed 5 years ago

emroberts95 commented 5 years ago

Hi, I don't have a strong computer science background so if this is not the place for this question please feel free to delete this or send me to where I should be asking, but I'm trying to use freesasa to calculate the SASA of many flavoproteins in a file.

I'm using Python 3.7.2 with Atom source code editor on a Windows machine.

I can get a freesasa program to work by typing in a filename individually but when I try to put it in a function in a "for" loop I keep getting this error from the Command Prompt:

C:\Users\myname\Desktop> python 00freesasaloop.py FreeSASA:lib\src\structure.c:679: error: input had no valid ATOM or HETATM lines FreeSASA:lib\src\structure.c:686: error: Traceback (most recent call last): File "00freesasaloop.py", line 24, in my_func(f) File "00freesasaloop.py", line 13, in my_func structure = freesasa.Structure(filename) File "freesasa.pyx", line 494, in freesasa.Structure.init Exception: Error reading '1akq.pdb1'.

Here is my code:

import freesasa
import os
inputdir = 'C:\\Users\\myname\\Desktop\\testunzip'

def my_func(filename):

    structure = freesasa.Structure(filename)
    result = freesasa.calc(structure)
    area_classes = freesasa.classifyResults(result,structure)

    print("This is the SASA of", filename)
    print("Total : %.2f A2" % result.totalArea())
    for key in area_classes:
        print( key, ": %.2f A2" % area_classes[key])  

for f in os.listdir(inputdir):
    my_func(f)

I've tried changing which folder the python program is saved in and which file is first so I know it's not just "1akq.pdb1" that is not working. I've also added

open(f, 'r+')

under my "for" loop and that didn't seem to make a difference.

The long-term goal of this program is to calculate these surface areas and then put the values into a database with other flavoprotein information. In the very long term, I will possibly adapt the program's calculations around the FAD/FMN coenzymes specifically. Any direction you could provide would be much appreciated. Thanks in advance! - Emily

mittinatten commented 5 years ago

Hi, I am not sure I can help you, this could be a problem with file permissions, or there could be something wrong with your file. I tried 1akq it in the freesasa demo at https://freesasa.github.io/demo/, and that worked fine at least. Have you modified the file somehow?

emroberts95 commented 5 years ago

I don't think I've modified the file. The only thing I've done since downloading them from the protein data bank website is unzip the .gz files using 7-Zip for windows.

It is a biological activities file, so it's "1akq.pdb1" rather than "1akq.pdb"

However, both file types work (and match the demo's outputs) when I input the name of the individual file in the command line and run the program using this code:

import freesasa
name = input("Please type the 4 digit code of the protein followed by .pdb ex. '2DOR.pdb1'")
print(name)

structure = freesasa.Structure(name)
result = freesasa.calc(structure)
area_classes = freesasa.classifyResults(result,structure)

print("This is the SASA of", name)
print("Total : %.2f A2" % result.totalArea())
for key in area_classes: # keys are apolor, and polar?
    print( key, ": %.2f A2" % area_classes[key])
mittinatten commented 5 years ago

Ok, so then must be something related to how you load your files. Are you running the script from the same directory as where the files are located? If not, the script probably can’t read them. Have you tried using the full path for the files instead?

mittinatten commented 5 years ago

Closing due to inactivity. Feel free reopen if you have more questions!

emroberts95 commented 5 years ago

Hi mittenatten, thanks for your help - I ended up using glob.glob(path): rather than os.listdir(inputdir): and it is working now.

mittinatten commented 5 years ago

Good to hear!