Open twocs opened 7 years ago
To get this file in the same format as the brain_body.txt, you can follow this syntax:
dataframe = pd.read_csv('challenge_dataset.txt', sep=',', header=None, names=['X', 'Y'])
You can see the difference between read_fwf and read_csv here: https://cl.ly/0D1e2s2u2d0J
If I recall, there is also the difference of header. Demo.txt has the column headings, but challenge_dataset.txt should have header=None, as you've shown.
I think I spent most of the time figuring out how to read the csv file and access the data, with only a little effort for the linear regression.
Would a link to pandas read_csv in the Readme be helpful or is there a more accessible source of info? The scikit-learn tutorial that is linked in the Readme doesn't seem to use pandas.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
where will I push my codes?
Just replace
dataframe = pd.read_fwf('brain_body.txt')
for
dataframe = pd.read_csv('challenge_dataset.txt',names=('x','y'))
run~~
where will I push my codes?
Just replace
`dataframe = pd.read_fwf('brain_body.txt')
for
`dataframe = pd.read_csv('challenge_dataset.txt',names=('x','y'))
run~
The point is not that it is possible for someone who knows how to use pandas to solve the I/O issues. The point is that there is no link to learn about pandas. If we are following the demo, we would try to use:
dataframe = pd.read_csv('challenge_dataset.txt')
But this is a problem, because the first line is inferred to be a header, not numerical data. That will affect the result of logical regression. We must therefore use:
dataframe = pd.read_csv('challenge_dataset.txt', header=None)
and may access the data as follows:
x_values = dataframe[[0]]
y_values = dataframe[[1]]
In solving the above, I tried consulting the pandas documentation, but it's very complicated as there are many optional parameters. Because of this issue with figuring out the appropriate function signature for pandas.read_csv, I filed this issue. To fix this issue, I would propose the same proposals I proposed earlier in this issue report.
Here is the function signature for pandas.csv from the documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=False, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)[source]
For what it's worth, this is the official guide for pandas I/O, which is slightly more useful: http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table
You can add header=0
dataframe = pd.read_csv('challenge_dataset.txt', header=0, names=["x", "y"])
It works for me with that.
My challenge_dataset.txt :
x,y 6.1101,17.592 5.5277,9.1302 8.5186,13.662
LR - challenge_dataset.txt
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
#read data
dataframe = pd.read_csv('challenge_dataset.txt', sep=',', header=None, names=['X', 'Y'])
x_values = dataframe[['X']]
y_values = dataframe[['Y']]
#train model on data
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
#visualize results
plt.scatter(x_values, y_values)
plt.plot(x_values, body_reg.predict(x_values))
plt.show()
The challenge.txt data file was in a different format than the brain_body.txt data file used for demo.py
To make it much more straightforward to do this work I'd suggest to either: