Closed jillianwhiting closed 6 years ago
I think the flexibility issue here is not with Pandas, but with Numpy. I would advise not converting the entire data frame into a Numpy array. Instead, I would advocate converting columns of the data frame columns as you need them. There may be a more elegant solution, but that is what I have done. So, for example, let's say you have a column of influent turbidity values you want to convert into a column, and the header in the .csv reads "Influent Turbidity (NTU)". In that case, what I would do is:
# Import units
from aide_design.play import *
# Import datalog
file = pd.read_csv('datalog 2-13-2018.xls',delimiter='\t') # My datalog files are saved as .xls
# Extract column from "file" data frame using the column header ("Influent Turbidity (NTU)")
# Extract values ('.values') from column (makes into a Numpyarray)
# Multiply by assigned units.
Inf = file["Influent Turbidity (NTU)"].values*u.NTU
If you have non-numerical values in a column (e.g., a header row is added in the middle of your data), you can use Pandas commands to find these values and replace them with something that can be handled by Numpy (e.g., NaN).
The updated method to import AguaClara code is
from aide_design.play import *
This one line is equivalent to
@monroews Thank you, I have updated my code accordingly! It's nice to see how all of this is evolving.
This works only if everything in the text file is a number, if you have something in the text file that is not a number you need to remove that either before importing it or if it is the top rows you can add a header input which says how many lines at the top should not be imported. The code below would not import the first row of the file.