isdsucph / isds2021

Introduction to Social Data Science 2021 - a summer school course https://isdsucph.github.io/isds2021/
MIT License
22 stars 37 forks source link

Ex. 0.4.1 #8

Open Casper17-max opened 3 years ago

Casper17-max commented 3 years ago

I'm not familiar with the ftp link format, I can't open it, or find a good guide on how to do it.

joachimkrasmussen commented 3 years ago

Hi Casper,

Maybe this page (link here) will help you when it comes to reading csv-files.

When it comes to finding the right url, follow the link that was given to you in the assignment (some browsers might struggle here as far as I remember). Right click on the link that is associated with the relevant year and copy the url. This is the url that you will need.

Does this make sense?

Best, Joachim

Casper17-max commented 3 years ago

Hi, I found a guide to open the ftp link, but I don't know which app to use to download/open the '1863.csv.gz' so I can see the data in a format that I am familiar with, like Excel. I found a guide to do the following code, but it doesn't seem quite right. image

joachimkrasmussen commented 3 years ago

Hi Casper,

You simply only need the relevant url. In practice, you don't even have to open the file in your browser. Just get the url and put it where you have '1863.csv.gz'. Next, you should think about your other arguments in .read_csv(). Would the default values maybe sometimes be more appropriate? For instance, how does sep=' ' help you here? Again, the link that I sent before can be helpful here!

Best, Joachim

lassearpe commented 3 years ago

I found Internet Explorer 11 to be suitable for the .ftp-format.

Best, Lasse

jonasfredslundkofoed commented 3 years ago

Hi Joachim,

I have the same issue as Lasse, and i'm afraid that I cannot use Internet Explorer 11 since I'm on a Mac. I have tried to open the link on Safari, Google Chrome and Firefox and every time I'm asked for a username and password that I don't have. What am I missing? I have attached a picture of the dialogue box asking for the password, and when I enter the link in my browser (any of the three) I'm asked if i want to open the file in Finder.

Best, Jonas

Skærmbillede 2021-07-19 kl  19 15 47
joachimkrasmussen commented 3 years ago

Thanks for the comment Lasse - Internet Explorer can indeed be used here.

If you are struggling with access, I will give you the url that you will need to open with pd.read_csv():

url = 'ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/1863.csv.gz'

Are you able to proceed now?

Best, Joachim

jonasfredslundkofoed commented 3 years ago

Yes, I'm able to continue. Thank you :)

Casper17-max commented 3 years ago

I still don't know what to do with the last columns, the guide or Google didn't help that much. image

joachimkrasmussen commented 3 years ago

You should pay attention to two features of .read_csv(): compression and header. What is the appropriate argument for each of these? Pay attention to the last two characters of the url and look at possible values for compression here.

Best, Joachim

lucasaabech commented 3 years ago

Now I've tried all of the different compression types, and im not getting anything from my dataset. I've tried a numerous types of headers aswell. Nothing really helps

image
joachimkrasmussen commented 3 years ago

Hi Lucas, you are very close here. Try with header = None in order to solve the problem with your data entering the names for the columns. Then you should be able to proceed to the next exercise, right?

And Casper! Sorry that I did not notice, but your last columns are just fine (in the next exercises, it will be clear that you should only work with the first four columns). You also just seem to struggle with the column labels.

Best, Joachim

johankll commented 3 years ago

Hi Joachim,

  1. How do we infer the right compression from the link, you posted earlier? (this link Is it simply the ".gz" at the end of the URL that tells us that the file is compressed using gzip?
  2. Is compression='gzip' the correct specification for CSV-files in general?
  3. Is the compression-statement necessary? I started without any compression-statement, and did not notice any problems.

Thanks in advance.

joachimkrasmussen commented 3 years ago

Hi Johan,

Let me try and answer all three questions: 1: In general, you can infer the right compression by looking at the file extension. 2 + 3: As the link also mentions, .read_csv() will generally infer the correct without specifying gzip (the default input for the argument compression is infer). If this for some reason fails, you may want to take a closer look at your file and understand how it is compressed. If you do not get any problems without the compression-statement, then the compression was probably carried out correctly with the default specification.

Was this helpful?

Best, Joachim

johankll commented 3 years ago

Hi Joachim,

Indeed. Thank you!