Closed EricBell271 closed 6 years ago
Thanks for reporting. Do you mean the Spyder API? It doesn't sound like it; not sure what API you are referring to (or whether you in fact mean something else). It really isn't clear what you are referring to, and as such is difficult to discern what is going on, replicate it and suggest fixes. Are both functions bound to the same name? Are you importing them in another script file? If you can provide sample code snippits of File1, File2 and how you are calling them, that would be very helpful. It doesn't sound like this is a Spyder/UMR specific issue, and may even be expected behavior, but it is impossible to tell from the information provided above.
As an aside, your versions of Spyder, Python and its dependencies are (especially Spyder itself) are rather out of date, and so while it probably won't fix this problem it would be a good idea to get new features and bug fixes. Assuming you don't have pinned packages, conda update anaconda
from the command line should do the trick.
1) Run this code in SPYDER
# -*- coding:utf-8 -*-
# This script will download all the 10-K, 10-Q and 8-K
# provided that of company symbol and its cik code.
import requests
import os
import errno
from bs4 import BeautifulSoup, SoupStrainer
from config import DEFAULT_DATA_PATH
class SecCrawler():
def __init__(self):
self.hello = "Welcome to Sec Cralwer!"
print("Path of the directory where data will be saved: " + DEFAULT_DATA_PATH)
def make_directory(self, company_code, cik, priorto, filing_type):
# Making the directory to save comapny filings
path = os.path.join(DEFAULT_DATA_PATH, company_code, cik, filing_type)
if not os.path.exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
def save_in_directory(self, company_code, cik, priorto, doc_list,
doc_name_list, filing_type):
# Save every text document into its respective folder
for j in range(len(doc_list)):
base_url = doc_list[j]
r = requests.get(base_url)
data = r.text
# print(data)
soup_text = BeautifulSoup(data, "lxml")
text_extract = 'P'
for i in soup_text.find_all('P'):
# # actual_text
# # print(actual_text)
print(i)
path = os.path.join(DEFAULT_DATA_PATH, company_code, cik,
filing_type, doc_name_list[j])
# print(path)
with open(path, "ab") as f:
f.write(data.encode('ascii', 'ignore'))
def filing_10Q(self, company_code, cik, priorto, count):
self.make_directory(company_code, cik, priorto, '10-Q')
# generate the url to crawl
base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=10-Q&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
print ("started 10-Q " + str(company_code))
r = requests.get(base_url)
data = r.text
# get doc list data
doc_list, doc_name_list = self.create_document_list(data)
try:
self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '10-Q')
except Exception as e:
print (str(e))
print ("Successfully downloaded all the files")
def filing_10K(self, company_code, cik, priorto, count):
self.make_directory(company_code,cik, priorto, '10-K')
# generate the url to crawl
base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=10-K&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
print ("started 10-K " + str(company_code))
r = requests.get(base_url)
data = r.text
# get doc list data
doc_list, doc_name_list = self.create_document_list(data)
try:
self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '10-K')
except Exception as e:
print (str(e))
print ("Successfully downloaded all the files")
def filing_8K(self, company_code, cik, priorto, count):
try:
self.make_directory(company_code,cik, priorto, '8-K')
except Exception as e:
print (str(e))
# generate the url to crawl
base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=8-K&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
print ("started 8-K" + str(company_code))
r = requests.get(base_url)
data = r.text
print(data)
# get doc list data
doc_list, doc_name_list = self.create_document_list(data)
try:
self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, '8-K')
except Exception as e:
print (str(e))
print ("Successfully downloaded all the files")
def filing_13F(self, company_code, cik, priorto, count):
try:
self.make_directory(company_code, cik, priorto, '13-F')
except Exception as e:
print (str(e))
# generate the url to crawl
base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=13F&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
print ("started 10-Q "+ str(company_code))
r = requests.get(base_url)
data = r.text
doc_list, doc_name_list = self.create_document_list(data)
try:
self.save_in_directory(company_code, cik, priorto, doc_list,
doc_name_list, '13-F')
except Exception as e:
print (str(e))
print ("Successfully downloaded all the files")
def filing_SD(self, company_code, cik, priorto, count):
self.make_directory(company_code, cik, priorto, 'SD')
# generate the url to crawl
base_url = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+str(cik)+"&type=sd&dateb="+str(priorto)+"&owner=exclude&output=xml&count="+str(count)
print ("started SD " + str(company_code))
r = requests.get(base_url)
data = r.text
# get doc list data
doc_list, doc_name_list = self.create_document_list(data)
try:
self.save_in_directory(company_code, cik, priorto, doc_list, doc_name_list, 'SD')
except Exception as e:
print (str(e))
print ("Successfully downloaded all the files")
# import lxml.html
def create_document_list(self, data):
# parse fetched data using beatifulsoup
soup = BeautifulSoup(data, "lxml")
print(data)
# store the link in the list
link_list = list()
# If the link is .htm convert it to .html
for link in soup.find_all('filinghref'):
url = link.string
if link.string.split(".")[len(link.string.split("."))-1] == "htm":
url += "l"
# print(url)
link_list.append(url)
link_list_final = link_list
print ("Number of files to download {0}".format(len(link_list_final)))
print ("Starting download....")
# List of url to the text documents
doc_list = list()
# List of document names
doc_name_list = list()
# actual_text = list()
# text_extract= '<P STYLE="margin-top:12pt; margin-bottom:0pt; font-size:10pt; font-family:Times New Roman">'
# Get all the doc
# text_extract= text_extract.encode('utf-8')
for k in range(len(link_list_final)):
required_url = link_list_final[k].replace('-index.html', '')
txtdoc = required_url + ".txt"
docname = txtdoc.split("/")[-1]
doc_list.append(txtdoc)
doc_name_list.append(docname)
return doc_list, doc_name_list
def remove_text(self, doc_name_list):
###write a function that opens all of the documents and scrapes the text
pass
2) Run this code from the other python file.
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 30 01:34:03 2017
@author: eric
"""
# -*- coding:utf-8 -*-
import time
from crawler import SecCrawler
def test():
t1 = time.time()
# file containig company name and corresponding cik codes
seccrawler = SecCrawler()
company_code_list = list() # company code list
cik_list = list() # cik code list
date_list = list() # pror date list
count_list = list()
try:
crs = open("C:/Sentibis/Wes_G_Sentiment/kermit/sec-edgar-master/SECEdgar/data.txt", "r")
except:
print ("No input file Found")
# get the comapny quotes and cik number from the file.
for columns in (raw.strip().split() for raw in crs):
company_code_list.append(columns[0])
cik_list.append(columns[1])
date_list.append(columns[2])
count_list.append(columns[3])
# call different API from the crawler
for i in range(1, len(cik_list)):
seccrawler.filing_SD(str(company_code_list[i]), str(cik_list[i]),
str(date_list[i]), str(count_list[i]))
# seccrawler.filing_10K(str(company_code_list[i]), str(cik_list[i]),
# str(date_list[i]), str(count_list[i]))
# seccrawler.filing_8K(str(company_code_list[i]), str(cik_list[i]),
# str(date_list[i]), str(count_list[i]))
# seccrawler.filing_10Q(str(company_code_list[i]), str(cik_list[i]),
# str(date_list[i]), str(count_list[i]))
t2 = time.time()
print ("Total Time taken: "),
print (t2 - t1)
crs.close()
if __name__ == '__main__':
test()
3) Make any kind of change from 1), and you will still see the output from 1).
NOTE: I ran update but it did not work.
Could Not attach data.txt data.txt
Quote CIK priorto(YYYYMMDD) Count AAPL 0000320193 20170101 100 ACN 1467373 20170101 100
Thanks for your response. Sorry, but could you simplify that to a minimal reproducible example, i.e. the shortest reasonable amount of code that demonstrates the problem with Spyder? I'm still not clear on what it is you are trying to do, what is going wrong, and how the Spyder IDE has anything to do with it.
Also, while you are at it, could you fix the formatting? Its totally broken as is, such that even if I wanted to run the hundreds of lines of code you posted, I would not be able to, as I can't copy and paste it into Spyder and have it work given the semantic importance of indents and formatting in Python. For each code block, you can include three backticks at the top and bottom to make them render properly, and so I suggest an organization like this if you want people to make sense of your problem:
File 1:
FILE 1 CODE
File2:
FILE 2 CODE
[Clear description of how File 1 and File 2 interact/are run, in prose or in code]
Thanks.
@real-quantdog Do you have the UMR option enabled?
@real-quantdog, how are you running your file 2
?
@real-quantdog Thanks for clarifying, it is much clearer now. At least for me, aside from enabling it, the two elements needed to get UMR to kick in and work properly is once I've made a change, to re-reun the first file to reload those defs into memory, and then run the second file (or specifically the import statement portion) to rebind them to whatever I've named them there. Doing only one of those two, or not in that order, generally have resulted in the behavior you are seeing, at least for me. Does that work for you?
UMR is enabled. I run it in order of File1, and File2. I run it all by ctrl-a, ctrl-enter. I can exit out and restart Spyder and I cannot make any changes in the same run. They are in the same directory. I have tried to run them in one script. It did not make any changes.
Is this not spyder but an issue with the code? I got this package from SEC Edgar on GITHub. It was working fine until I started using BeautifulSoup.
Thanks for the additional information. At least for me, if I don't just run File 2 with "Run File" (F5 by default), I always run at least the equivalent of File 1 (my module/package) with "Run File" as otherwise I don't believe UMR is triggered otherwise (I can't conveniently test as I'm not on my main Python dev machine at the moment). I usually rebind them to whatever I've named them in my equivalent of File 2 just by running the relevant chunk, and the same with running my main code, as those steps can be executed independently. In your case, you can just run File 2 directly with "Run File" and it should work (in my exploratory/scientific workflow, I usually do it the above way as "File 2" tends to be a fairly involved script I don't want to rerun all of).
That all works at least for me, so you could at least try running your files with "Run File" and seeing if it makes any difference? You should see when your modules get reloaded as Spyder will print it in red to the console—does that occur?
As @CAM-Gerlach pointed out, try running File2 with F5. The module in File2 will then be re-imported, with the changes you made, and there is no need to run File1 then.
@real-quantdog, thanks for reporting. We can improve the current situation by loading by default the %autoreload
IPython magic.
This magic tries to update the code you defined in one file (let's call it your module file) and called in another (let's call it your run file), as soon as you save your module file. That way you won't be forced to run your entire run file every time.
The change is very simple, so I decided to do it in our next version (3.2.5), which I plan to release this weekend.
Description
What steps will reproduce the problem?
RUN1:
File1 : return 'this is a function'
File2 : run it, and it gives output, 'this is a function'
RUN2 File1 : return 'this is a slightly different function'
File2 : run it, and it gives the output from the first run, and not the output of the slightly different function.
What is the expected output? What do you see instead?
Please provide any additional information below
Python 3.6
Version and main components
Dependencies