mitreac / b575f19

UM DCMB BIOINF 575 Fall 2019 class repo
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

[QUESTION] issues with mean function #21

Open jesuscastor opened 5 years ago

jesuscastor commented 5 years ago

Instructions


Question (continued)

I am having trouble using the mean function in numpy. After creating the lists I wanted, I made sure to convert the values as string as under the variable inspector I could only see that the type of the objects was a list (not a string specifically). I created the list of lists (lst_ge) and the array for it and tried to use .astype to convert it as a float with no success. I do not understand why I get this error.

lst_ge_AD_str = [str(i) for i in lst_ge_AD] 
lst_ge_C_str=[str(i) for i in lst_ge_C] 
lst_ge= [lst_ge_AD_str, lst_ge_C_str]
array=np.array(lst_ge).astype(float)
array.mean(axis=0)
print(array)

What do you have?

Use this space to describe what resources/functions/code you have that you feel may be relevant to answering this question. Additionally, if this is a coding question, describe what input this problem section of code requires. By "input", we mean what type of data it is.

.astype could help to convert to float [float()] could help to create a list of floats instead of a list of strings dtype inside the .array function could help to convert as float

stack overflow showed another command to help on this:

x=np.array(['1.1', '2.2', '3.3'])
x=np.asfarray(x,float)

Or I could do a for loop that selects the values and creates a list of floats


What do you need?

Use this space to describe what resources/functions/code/data you believe is required to determine if this question has been solved.

I think the output for lst_ge needs to be a float to be able to do the .mean operation.

I found another post in github that suggested that maybe the array is not well organized on the number if rows vs columns, which might be the case as the control data has 8 columns and AD has 9. I did not think this was an issue because the mean function should calculate the mean of the 9 columns and give one output value and calculate the mean of 8 columns and output one value.

https://github.com/numpy/numpy/issues/6584

# Paste your code in this code block (if needed)

What have you done?

Use this space to describe what approaches you have done in your attempts to solve this question yourself. This is the *most important part of your question.

I have tried converting the list to float before converting to an array I have tried converting to float while doing the array function I have tried doing a for loop to select for only floats I have tried using farray

If this is just a general question

Describe what your hypothesis of what the answer is here

If this is a coding question

We require you to post your code, error messages, and output of your code here.

User-written code

# Paste your code here

Observed error messages

I get this error if I dont convert to float which makes me think that is the issue

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-124-fafdad7e45aa> in <module>
      3 lst_ge= [lst_ge_AD_str, lst_ge_C_str]
      4 array=np.array(lst_ge)#.astype(float)
----> 5 array.mean(axis=0)
      6 #print(array)
      7 

~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
     85             ret = ret.dtype.type(ret / rcount)
     86     else:
---> 87         ret = ret / rcount
     88 
     89     return ret

TypeError: unsupported operand type(s) for /: 'list' and 'int'

I get this error when I try to convert to float

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-125-956141314d44> in <module>
      2 lst_ge_C_str=[str(i) for i in lst_ge_C]
      3 lst_ge= [lst_ge_AD_str, lst_ge_C_str]
----> 4 array=np.array(lst_ge).astype(float)
      5 array.mean(axis=0)
      6 #print(array)

ValueError: setting an array element with a sequence.

Observed output of code (other than error messages)

here I am including the data in the variable inspector


πŸ—‘ | lst_ge | list | 80 | 2 | [['0.0',  "['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429']", "[ ...
-- | -- | -- | -- | -- | --
πŸ—‘ | lst_ge_AD | list | 34616 | 3883 | [0.0,  ['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429'], ['0.115 ...
πŸ—‘ | lst_ge_AD_str | list | 33928 | 3883 | ['0.0',  "['1.000632206', '0.849654278', '1.143676123', '1.510414254',  '2.234817557', '1.312646817', '1.759139574', '1.354627234',  '1.927143429']", "[' ...
πŸ—‘ | lst_ge_C | list | 33928 | 3882 | [['0.847255151',  '3.154229039', '1.486141203', '1.311721839', '1.306761802',  '1.058905867', '1.249770005', '1.116947401'], ['0.037374936', '0',  '0.052 ...
πŸ—‘ | lst_ge_C_str | list | 33928 | 3882 | ["['0.847255151',  '3.154229039', '1.486141203', '1.311721839', '1.306761802',  '1.058905867', '1.249770005', '1.116947401']", "['0.037374936', '0', '0.  ...
πŸ—‘ | lst_genes | list | 33928 | 3882 | ['A1BG',  'NAT2', 'SERPINA3', 'AADAC', 'AAMP', 'AANAT', 'AARS', 'ABAT', 'ABCA1',  'ADA', 'ADAM8', 'ADAM10', 'ADAR', 'ADARB1', 'ADARB2', 'ADCY1', 'ADCY2'  ...

Checklist

Replace the space in between the brackets with an X

mitreac commented 5 years ago

The following two statements are a good a good way to practice comprehensions. However,lst_ge_AD and lst_ge_C are supposed to be lists of lists so the following statements would apply the str function to a list. Also, these statements are not needed here, lst_ge_AD and lst_ge_C are expected to already have string values and from what I can see they do: lst_ge_AD ['1.000632206', '0.849654278' ....

lst_ge_AD_str = [str(i) for i in lst_ge_AD] 
lst_ge_C_str=[str(i) for i in lst_ge_C] 

lst_ge will be a list likelst_ge_AD or lst_ge_C not a list with two such lists. As you can see in the last function the compute_row_average function will be called using each of these list separately.

ge_mean_C = compute_row_average(lst_ge_C)
ge_mean_AD = compute_row_average(lst_ge_AD)

So, the following statement that uses the control and AD lists to create a list containing both of them is not needed.

lst_ge= [lst_ge_AD_str, lst_ge_C_str]

Below is example code with a list of lists transformed into a 2 by 3 array computing different means. This can help you better understand the mean function as well as allow you to test code on a small list rather than using a large list.

test_list = [['1.5','2','3'],['4','5','6']]
test_array =np.array(test_list).astype(float)
means_all = test_array.mean()
means_axis0 = test_array.mean(axis=0)
means_axis1 = test_array.mean(axis=1)
print(means_all)
print(means_axis0)
print(means_axis1)
nicolegobo commented 5 years ago

Thank you for the quick 1 hour response on a Saturday - impressive. Hopefully the testlist you pasted will help me solve my question. If not, I will post. ^^

jesuscastor commented 5 years ago

Thank you cristina. I was able to correct the error and better understand how the mean function behaves