jesuscastor commented 5 years ago

Instructions

The more information/context you provide, the better we (the instructors and class) can collectively assist you.
Please fill out all of the relevant sections below. Incomplete questions will not be addressed by the instruction team.
Questions asked without showing an attempt to answer individually will also not be addressed.
It is bad etiquette to ask things such as "I don't know how to do this, can you tell me how?" or "I can't get this code to work, what is wrong?" if you do not provide the context and/or screenshots necessary to help.

Question (continued)

I am having trouble using the mean function in numpy. After creating the lists I wanted, I made sure to convert the values as string as under the variable inspector I could only see that the type of the objects was a list (not a string specifically). I created the list of lists (lst_ge) and the array for it and tried to use .astype to convert it as a float with no success. I do not understand why I get this error.

lst_ge_AD_str = [str(i) for i in lst_ge_AD] 
lst_ge_C_str=[str(i) for i in lst_ge_C] 
lst_ge= [lst_ge_AD_str, lst_ge_C_str]
array=np.array(lst_ge).astype(float)
array.mean(axis=0)
print(array)

What do you have?

Use this space to describe what resources/functions/code you have that you feel may be relevant to answering this question. Additionally, if this is a coding question, describe what input this problem section of code requires. By "input", we mean what type of data it is.

.astype could help to convert to float [float()] could help to create a list of floats instead of a list of strings dtype inside the .array function could help to convert as float

stack overflow showed another command to help on this:

x=np.array(['1.1', '2.2', '3.3'])
x=np.asfarray(x,float)

Or I could do a for loop that selects the values and creates a list of floats

What do you need?

Use this space to describe what resources/functions/code/data you believe is required to determine if this question has been solved.

I think the output for lst_ge needs to be a float to be able to do the .mean operation.

I found another post in github that suggested that maybe the array is not well organized on the number if rows vs columns, which might be the case as the control data has 8 columns and AD has 9. I did not think this was an issue because the mean function should calculate the mean of the 9 columns and give one output value and calculate the mean of 8 columns and output one value.

https://github.com/numpy/numpy/issues/6584

# Paste your code in this code block (if needed)

What have you done?

Use this space to describe what approaches you have done in your attempts to solve this question yourself. This is the *most important part of your question.

I have tried converting the list to float before converting to an array I have tried converting to float while doing the array function I have tried doing a for loop to select for only floats I have tried using farray

If this is just a general question

Describe what your hypothesis of what the answer is here

If this is a coding question

We require you to post your code, error messages, and output of your code here.

User-written code

# Paste your code here

Observed error messages

I get this error if I dont convert to float which makes me think that is the issue

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-124-fafdad7e45aa> in <module>
      3 lst_ge= [lst_ge_AD_str, lst_ge_C_str]
      4 array=np.array(lst_ge)#.astype(float)
----> 5 array.mean(axis=0)
      6 #print(array)
      7 

~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
     85             ret = ret.dtype.type(ret / rcount)
     86     else:
---> 87         ret = ret / rcount
     88 
     89     return ret

TypeError: unsupported operand type(s) for /: 'list' and 'int'

I get this error when I try to convert to float

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-125-956141314d44> in <module>
      2 lst_ge_C_str=[str(i) for i in lst_ge_C]
      3 lst_ge= [lst_ge_AD_str, lst_ge_C_str]
----> 4 array=np.array(lst_ge).astype(float)
      5 array.mean(axis=0)
      6 #print(array)

ValueError: setting an array element with a sequence.

Observed output of code (other than error messages)

here I am including the data in the variable inspector


🗑 | lst_ge | list | 80 | 2 | [['0.0',  "['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429']", "[ ...
-- | -- | -- | -- | -- | --
🗑 | lst_ge_AD | list | 34616 | 3883 | [0.0,  ['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429'], ['0.115 ...
🗑 | lst_ge_AD_str | list | 33928 | 3883 | ['0.0',  "['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429']", "[' ...
🗑 | lst_ge_C | list | 33928 | 3882 | [['0.847255151',  '3.154229039', '1.486141203', '1.311721839', '1.306761802',  '1.058905867', '1.249770005', '1.116947401'], ['0.037374936', '0',  '0.052 ...
🗑 | lst_ge_C_str | list | 33928 | 3882 | ["['0.847255151',  '3.154229039', '1.486141203', '1.311721839', '1.306761802',  '1.058905867', '1.249770005', '1.116947401']", "['0.037374936', '0', '0.  ...
🗑 | lst_genes | list | 33928 | 3882 | ['A1BG',  'NAT2', 'SERPINA3', 'AADAC', 'AAMP', 'AANAT', 'AARS', 'ABAT', 'ABCA1',  'ADA', 'ADAM8', 'ADAM10', 'ADAR', 'ADARB1', 'ADARB2', 'ADCY1', 'ADCY2'  ...

Checklist

Replace the space in between the brackets with an X

[ x] Identified what you already have available to address this question
[x ] Identified what you need to reach an acceptable answer
[x ] Shown what you have done up to this point
- [ ] A hypothesis if this is a general question
- [ x] Your code, error messages, and output if this is a coding question

mitreac commented 5 years ago

The following two statements are a good a good way to practice comprehensions. However,lst_ge_AD and lst_ge_C are supposed to be lists of lists so the following statements would apply the str function to a list. Also, these statements are not needed here, lst_ge_AD and lst_ge_C are expected to already have string values and from what I can see they do: lst_ge_AD ['1.000632206', '0.849654278' ....

lst_ge_AD_str = [str(i) for i in lst_ge_AD] 
lst_ge_C_str=[str(i) for i in lst_ge_C]

lst_ge will be a list likelst_ge_AD or lst_ge_C not a list with two such lists. As you can see in the last function the compute_row_average function will be called using each of these list separately.

ge_mean_C = compute_row_average(lst_ge_C)
ge_mean_AD = compute_row_average(lst_ge_AD)

So, the following statement that uses the control and AD lists to create a list containing both of them is not needed.

lst_ge= [lst_ge_AD_str, lst_ge_C_str]

Below is example code with a list of lists transformed into a 2 by 3 array computing different means. This can help you better understand the mean function as well as allow you to test code on a small list rather than using a large list.

test_list = [['1.5','2','3'],['4','5','6']]
test_array =np.array(test_list).astype(float)
means_all = test_array.mean()
means_axis0 = test_array.mean(axis=0)
means_axis1 = test_array.mean(axis=1)
print(means_all)
print(means_axis0)
print(means_axis1)

nicolegobo commented 5 years ago

Thank you for the quick 1 hour response on a Saturday - impressive. Hopefully the testlist you pasted will help me solve my question. If not, I will post. ^^