spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.38k stars 1.63k forks source link

Code from one file not updating when called from another if run directly in the console #5877

Closed EricBell271 closed 6 years ago

EricBell271 commented 7 years ago

Description

What steps will reproduce the problem?

  1. I am running my function in one file. And then testing the output in another file. When I change the first file, the changes do not go through to the other file. I can only run a program once, without having to reset the API. Example

RUN1:

File1 : return 'this is a function'

File2 : run it, and it gives output, 'this is a function'

RUN2 File1 : return 'this is a slightly different function'

File2 : run it, and it gives the output from the first run, and not the output of the slightly different function.

What is the expected output? What do you see instead?

Please provide any additional information below

Python 3.6

Version and main components

Dependencies

pyflakes >=0.6.0 :  1.5.0 (OK)
pep8 >=0.6       :  1.7.0 (OK)
pygments >=2.0   :  2.2.0 (OK)
qtconsole >=4.2.0:  4.3.0 (OK)
nbconvert >=4.0  :  5.1.1 (OK)
pandas >=0.13.1  :  0.20.1 (OK)
numpy >=1.7      :  1.12.1 (OK)
sphinx >=0.6.6   :  1.5.6 (OK)
rope >=0.9.4     :  0.9.4-1 (OK)
jedi >=0.9.0     :  0.10.2 (OK)
matplotlib >=1.0 :  2.0.2 (OK)
sympy >=0.7.3    :  1.0 (OK)
pylint >=0.25    :  1.6.4 (OK)
CAM-Gerlach commented 7 years ago

Thanks for reporting. Do you mean the Spyder API? It doesn't sound like it; not sure what API you are referring to (or whether you in fact mean something else). It really isn't clear what you are referring to, and as such is difficult to discern what is going on, replicate it and suggest fixes. Are both functions bound to the same name? Are you importing them in another script file? If you can provide sample code snippits of File1, File2 and how you are calling them, that would be very helpful. It doesn't sound like this is a Spyder/UMR specific issue, and may even be expected behavior, but it is impossible to tell from the information provided above.

As an aside, your versions of Spyder, Python and its dependencies are (especially Spyder itself) are rather out of date, and so while it probably won't fix this problem it would be a good idea to get new features and bug fixes. Assuming you don't have pinned packages, conda update anaconda from the command line should do the trick.

EricBell271 commented 7 years ago

1) Run this code in SPYDER

# -*- coding:utf-8 -*-
# This script will download all the 10-K, 10-Q and 8-K
# provided that of company symbol and its cik code.

import requests
import os
import errno
from bs4 import BeautifulSoup, SoupStrainer
from config import DEFAULT_DATA_PATH

class SecCrawler():

    def __init__(self):
        self.hello = "Welcome to Sec Cralwer!"
        print("Path of the directory where data will be saved: " + DEFAULT_DATA_PATH)

    def make_directory(self, company_code, cik, priorto, filing_type):
        # Making the directory to save comapny filings
        path = os.path.join(DEFAULT_DATA_PATH, company_code, cik, filing_type)

        if not os.path.exists(path):
            try:
                os.makedirs(path)
            except OSError as exception:
                if exception.errno != errno.EEXIST:
                    raise

    def save_in_directory(self, company_code, cik, priorto, doc_list,
        doc_name_list, filing_type):
        # Save every text document into its respective folder
        for j in range(len(doc_list)):
            base_url = doc_list[j]
            r = requests.get(base_url)
            data = r.text
            # print(data)            
            soup_text = BeautifulSoup(data, "lxml")
            text_extract = 'P'
            for i in soup_text.find_all('P'):
# #                actual_text
# #                print(actual_text)
                print(i)
            path = os.path.join(DEFAULT_DATA_PATH, company_code, cik,
                filing_type, doc_name_list[j])
#            print(path)
            with open(path, "ab") as f:

                f.write(data.encode('ascii', 'ignore'))

    def filing_10Q(self, company_code, cik, priorto, count):

        self.make_directory(company_code, cik, priorto, '10-Q')

        # generate the url to crawl
        base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=10-Q&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
        print ("started 10-Q " + str(company_code))
        r = requests.get(base_url)
        data = r.text

        # get doc list data
        doc_list, doc_name_list = self.create_document_list(data)

        try:
            self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '10-Q')
        except Exception as e:
            print (str(e))

        print ("Successfully downloaded all the files")

    def filing_10K(self, company_code, cik, priorto, count):

        self.make_directory(company_code,cik, priorto, '10-K')

        # generate the url to crawl
        base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=10-K&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
        print ("started 10-K " + str(company_code))

        r = requests.get(base_url)
        data = r.text

        # get doc list data
        doc_list, doc_name_list = self.create_document_list(data)

        try:
            self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '10-K')
        except Exception as e:
            print (str(e))

        print ("Successfully downloaded all the files")

    def filing_8K(self, company_code, cik, priorto, count):
        try:
            self.make_directory(company_code,cik, priorto, '8-K')
        except Exception as e:
            print (str(e))

        # generate the url to crawl
        base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=8-K&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)

        print ("started 8-K" + str(company_code))
        r = requests.get(base_url)
        data = r.text
        print(data)
        # get doc list data
        doc_list, doc_name_list = self.create_document_list(data)

        try:
            self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '8-K')
        except Exception as e:
            print (str(e))

        print ("Successfully downloaded all the files")

    def filing_13F(self, company_code, cik, priorto, count):
        try:
            self.make_directory(company_code, cik, priorto, '13-F')
        except Exception as e:
            print (str(e))

        # generate the url to crawl
        base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=13F&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
        print ("started 10-Q "+ str(company_code))
        r = requests.get(base_url)
        data = r.text

        doc_list, doc_name_list = self.create_document_list(data)

        try:
            self.save_in_directory(company_code, cik, priorto, doc_list,
                doc_name_list, '13-F')
        except Exception as e:
            print (str(e))

        print ("Successfully downloaded all the files")

    def filing_SD(self, company_code, cik, priorto, count):

        self.make_directory(company_code, cik, priorto, 'SD')

        # generate the url to crawl
        base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=sd&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
        print ("started SD " + str(company_code))
        r = requests.get(base_url)
        data = r.text

        # get doc list data
        doc_list, doc_name_list = self.create_document_list(data)

        try:
            self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, 'SD')
        except Exception as e:
            print (str(e))

        print ("Successfully downloaded all the files")
#    import lxml.html
    def create_document_list(self, data):
        # parse fetched data using beatifulsoup

        soup = BeautifulSoup(data, "lxml")
        print(data)             
        # store the link in the list

        link_list = list()

        # If the link is .htm convert it to .html
        for link in soup.find_all('filinghref'):
            url = link.string
            if link.string.split(".")[len(link.string.split("."))-1] == "htm":
                url += "l"
#            print(url)
            link_list.append(url)
        link_list_final = link_list

        print ("Number of files to download {0}".format(len(link_list_final)))
        print ("Starting download....")

        # List of url to the text documents
        doc_list = list()
        # List of document names
        doc_name_list = list()

#        actual_text = list()
#        text_extract= '<P STYLE="margin-top:12pt; margin-bottom:0pt; font-size:10pt; font-family:Times New Roman">'
        # Get all the doc
#        text_extract= text_extract.encode('utf-8')
        for k in range(len(link_list_final)):
            required_url = link_list_final[k].replace('-index.html', '')

            txtdoc = required_url + ".txt"
            docname = txtdoc.split("/")[-1]

            doc_list.append(txtdoc)
            doc_name_list.append(docname)

        return doc_list, doc_name_list

    def remove_text(self, doc_name_list):
        ###write a function that opens all of the documents and scrapes the text 
        pass

2) Run this code from the other python file.

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 30 01:34:03 2017

@author: eric
"""
# -*- coding:utf-8 -*-
import time
from crawler import SecCrawler

def test():
    t1 = time.time()
    # file containig company name and corresponding cik codes
    seccrawler = SecCrawler()

    company_code_list = list()   # company code list
    cik_list = list()            # cik code list
    date_list = list()           # pror date list
    count_list = list()

    try:
        crs = open("C:/Sentibis/Wes_G_Sentiment/kermit/sec-edgar-master/SECEdgar/data.txt", "r")
    except:
        print ("No input file Found")

    # get the comapny  quotes and cik number from the file.

    for columns in (raw.strip().split() for raw in crs):
        company_code_list.append(columns[0])
        cik_list.append(columns[1])
        date_list.append(columns[2])
        count_list.append(columns[3])

    # call different  API from the crawler
    for i in range(1, len(cik_list)):
        seccrawler.filing_SD(str(company_code_list[i]), str(cik_list[i]),
            str(date_list[i]), str(count_list[i]))
#        seccrawler.filing_10K(str(company_code_list[i]), str(cik_list[i]),
#            str(date_list[i]), str(count_list[i]))
#        seccrawler.filing_8K(str(company_code_list[i]), str(cik_list[i]),
#            str(date_list[i]), str(count_list[i]))
#        seccrawler.filing_10Q(str(company_code_list[i]), str(cik_list[i]),
#            str(date_list[i]), str(count_list[i]))

    t2 = time.time()

    print ("Total Time taken: "),
    print (t2 - t1)
    crs.close()

if __name__ == '__main__':
    test()

3) Make any kind of change from 1), and you will still see the output from 1).

NOTE: I ran update but it did not work.

Could Not attach data.txt data.txt

Quote CIK priorto(YYYYMMDD) Count AAPL 0000320193 20170101 100 ACN 1467373 20170101 100

CAM-Gerlach commented 7 years ago

Thanks for your response. Sorry, but could you simplify that to a minimal reproducible example, i.e. the shortest reasonable amount of code that demonstrates the problem with Spyder? I'm still not clear on what it is you are trying to do, what is going wrong, and how the Spyder IDE has anything to do with it.

Also, while you are at it, could you fix the formatting? Its totally broken as is, such that even if I wanted to run the hundreds of lines of code you posted, I would not be able to, as I can't copy and paste it into Spyder and have it work given the semantic importance of indents and formatting in Python. For each code block, you can include three backticks at the top and bottom to make them render properly, and so I suggest an organization like this if you want people to make sense of your problem:

File 1:

FILE 1 CODE

File2:

FILE 2 CODE

[Clear description of how File 1 and File 2 interact/are run, in prose or in code]

Thanks.

jnsebgosselin commented 7 years ago

@real-quantdog Do you have the UMR option enabled?

image

ccordoba12 commented 7 years ago

@real-quantdog, how are you running your file 2?

CAM-Gerlach commented 7 years ago

@real-quantdog Thanks for clarifying, it is much clearer now. At least for me, aside from enabling it, the two elements needed to get UMR to kick in and work properly is once I've made a change, to re-reun the first file to reload those defs into memory, and then run the second file (or specifically the import statement portion) to rebind them to whatever I've named them there. Doing only one of those two, or not in that order, generally have resulted in the behavior you are seeing, at least for me. Does that work for you?

EricBell271 commented 7 years ago

UMR is enabled. I run it in order of File1, and File2. I run it all by ctrl-a, ctrl-enter. I can exit out and restart Spyder and I cannot make any changes in the same run. They are in the same directory. I have tried to run them in one script. It did not make any changes.

Is this not spyder but an issue with the code? I got this package from SEC Edgar on GITHub. It was working fine until I started using BeautifulSoup.

CAM-Gerlach commented 7 years ago

Thanks for the additional information. At least for me, if I don't just run File 2 with "Run File" (F5 by default), I always run at least the equivalent of File 1 (my module/package) with "Run File" as otherwise I don't believe UMR is triggered otherwise (I can't conveniently test as I'm not on my main Python dev machine at the moment). I usually rebind them to whatever I've named them in my equivalent of File 2 just by running the relevant chunk, and the same with running my main code, as those steps can be executed independently. In your case, you can just run File 2 directly with "Run File" and it should work (in my exploratory/scientific workflow, I usually do it the above way as "File 2" tends to be a fairly involved script I don't want to rerun all of).

That all works at least for me, so you could at least try running your files with "Run File" and seeing if it makes any difference? You should see when your modules get reloaded as Spyder will print it in red to the console—does that occur?

jnsebgosselin commented 7 years ago

As @CAM-Gerlach pointed out, try running File2 with F5. The module in File2 will then be re-imported, with the changes you made, and there is no need to run File1 then.

ccordoba12 commented 7 years ago

@real-quantdog, thanks for reporting. We can improve the current situation by loading by default the %autoreload IPython magic.

This magic tries to update the code you defined in one file (let's call it your module file) and called in another (let's call it your run file), as soon as you save your module file. That way you won't be forced to run your entire run file every time.

The change is very simple, so I decided to do it in our next version (3.2.5), which I plan to release this weekend.