Closed nir-jackson closed 9 years ago
I can't find it in the C# script, you have more scripts that creates it?
On Wed, Mar 4, 2015 at 4:41 PM, Nir Jackson notifications@github.com wrote:
@AlexGr2 https://github.com/AlexGr2
the script ThreeLetterListGather.py didn't create the 2.txt file that holds all the words with 2 or less letters. please fix this.
— Reply to this email directly or view it on GitHub https://github.com/shenkarlab/Off-The-record/issues/31.
add bool first = true; before all of the loops
then add:
if (first){ word = "2"; first=false; } after this line: word = alpha[num] + alpha[num2] + alpha[num3]
this won't create the אאא file (which is empty anyway) and will create the 2 file instead
2.txt creation added
On Wed, Mar 4, 2015 at 4:52 PM, Nir Jackson notifications@github.com wrote:
Reopened #31 https://github.com/shenkarlab/Off-The-record/issues/31.
— Reply to this email directly or view it on GitHub https://github.com/shenkarlab/Off-The-record/issues/31#event-246035978.
import urllib, json import collections from collections import Counter import operator import sys, os import codecs author = 'AlexGruber'
word = "סתם"; alpha = u"א", u"ב", u"ג", u"ד", u"ה", u"ו", u"ז", u"ח", u"ט", u"י", u"כ", u"ל", u"מ", u"נ", u"ס", u"ע", u"פ", u"צ", u"ק", u"ר", u"ש", u"ת", u"ץ", u"ך", u"ם", u"ן" path = "testFolder" mergedJspnPath= "pythonfiles" rootPath = "c:/"
poli = [line.strip() for line in open(os.path.join(os.path.dirname(os.path.abspath(file)), 'potilics.txt'))]
first = True
for num in range(0,22): for num2 in range(0,22): for num3 in range(0, 23): word="" word = alpha[num] + alpha[num2] + alpha[num3] if first: word = "2" first=False
newpath = os.path.join(rootPath, mergedJspnPath, word + ".txt")
print newpath
fileOpenWrite = open(newpath,'a')
count = 0
fileOpenWrite.write("[")
for i in range(0, len(poli)):
temppath = os.path.join(rootPath,path + poli[i], word + ".txt")
if os.path.exists(temppath):
if count == 0 :
fileOpenWrite.write("{\"id\":\"" + poli[i] + "\",\"resarr\":")
else:
fileOpenWrite.write(",{\"id\":\"" + poli[i] + "\",\"resarr\":")
count += 1
fileREAD = codecs.open(temppath, "r", "utf-8")
foundbug = False
for line in fileREAD.readlines():
for j in range(0, len(line)):
if j < len(line) - 1:
if line[j] == ']' and line[j+1] == '[':
foundbug = True
break
if(foundbug):
fileOpenWrite.write("]")
break
fileOpenWrite.write(line)
fileOpenWrite.write('}')
fileOpenWrite.write(']')
print "DONE"
@AlexGr2 just one thing, the third for loop needs to be from 0 to 26 (not 23). all the updated scripts (including the potilics.txt without 66398526339)
*please send me all the updated scripts (including the potilics.txt without 66398526339)
here all the update scripts with MainScript.py with no word check per count, and threeLetterGather.py with 2.txt potilitcs.txt with no 66398526339
On Wed, Mar 4, 2015 at 7:55 PM, Nir Jackson notifications@github.com wrote:
*please send me all the updated scripts (including the potilics.txt without 66398526339)
— Reply to this email directly or view it on GitHub https://github.com/shenkarlab/Off-The-record/issues/31#issuecomment-77209483 .
import urllib, json import collections from collections import Counter import operator import sys, os import codecs author = 'AlexGruber'
threeLetterWordsPath = 'testFolder' perDateCountPath = 'WholeDates' allWordsPath = 'wordAllCount' rootPath = "c:/"
wordCountPerPolitics = collections.defaultdict(int) wordIdPerPolitics = collections.defaultdict(list) politicsDictWords = collections.defaultdict(dict) politicsDictId = collections.defaultdict(dict) dateDict = collections.OrderedDict() dateMutliArray = []
list_NextUrls = list() buffer_list = list() buffer_string = "" allPostsIDsDict ={} wordIdDict = collections.defaultdict(list) politicDictWordsCount={} politicDictMilitaryCount = {} politicDictSahbekimCount = {} politicDictNarcistCount = {} politicDictKalkalaCount = {} politicDictRahlanCount = {} tempdict={} errorCount = 0
testlist = list() threeLetterList = list()
stuctList = list()
threeLetterDict = collections.defaultdict(list)
dateDictForThreeLetter = {}
avoidWords = [u'ישראל', u'כל', u'את', u'של', u'זה', u'על', u'או', u'גם', u'אז', u'רק', 'and', 'the', 'of', u'עד', u'אשר', u'כי', u'אם', \ 'in', 'to', 'a', 'that' ,'is', 'for', 'with', 'are', 'this', 'have', 'The', 'on', u'-', 'it', 'from', 'a' , \ 'at', 'as', u'היא', u'אני', u'לא', u'עם', u'הוא', u'•', u'(', u')', u' ', u'', u"" , "", " " \ u':', u'"', u',' ,u'–', u'?', u'!', u'.', u'', u'', u'']
dateDict = collections.OrderedDict() allPostsIDsDict ={} wordIdDict = collections.defaultdict(list) tempdict={} errorCount = 0
def createBuffer_Posts(url): list = "" response = urllib.urlopen(url) data = json.loads(response.read()) try : parent = data["posts"]["data"] for index in range(len(parent)): try: allPostsIDsDict[parent[index]["id"]] = [parent[index]["message"] , parent[index]["created_time"]] except: errorCount=1 except: print "no parent data"
return list
def perDateCount(politicFileName):
tempDate = ""
tempBuildString=list()
tempIdList = list()
runOnce = True
obj = []
print "Counting Words"
print len(allPostsIDsDict)
tempJson = {"Dates" :{}}
for date, msg in dateDict.iteritems() :
print date
Constdate = date.split("T")[0].split('-')[0] + ":" + date.split("T")[0].split('-')[1]
if runOnce == True:
tempDate = Constdate
runOnce = False
if tempDate == Constdate:
try:
tempWordsList = msg[0].split()
except:
errorCount = 1
for word in tempWordsList:
# if not word in avoidWords:
word = word.replace(u'.', u"")
word = word.replace(u'!', u"")
word = word.replace(u'?', u"")
word = word.replace(u'-', u"")
word = word.replace(u',', u"")
word = word.replace(u'"', u"")
word = word.replace(u':', u"")
word = word.replace(u'(', u"")
word = word.replace(u')', u"")
word = word.replace(u'*', u"")
if not word in avoidWords:
try:
wordIdDict[word].append(msg[1])
tempBuildString.append(word)
except:
errorCount = 1
else:
JsonTempDate = tempDate
tempDate = Constdate
mostCommon = Counter(tempBuildString).most_common(10)
printHeb = dict(mostCommon)
sorted_x = sorted(printHeb.items(), key = operator.itemgetter(1), reverse=True)
for key, value in sorted_x:
if wordIdDict.has_key(key):
obj.append({"Word" : key ,"Amount" : str(value), "array_id" : list(set(wordIdDict[key]))})
tempJson["Dates"][JsonTempDate]=[obj]
obj=[]
del tempBuildString[:]
del tempIdList[:]
del tempWordsList[:]
wordIdDict.clear()
# with open('c:\\WholeDates\\' + politicFileName + '.txt', 'w') as outfile:
# json.dump(tempJson, outfile, indent=4)
tempJson.clear()
dateDict.clear()
del tempBuildString[:]
del tempIdList[:]
del tempWordsList[:]
wordIdDict.clear()
print "finished counting " + politicFileName
def AllwordCounter(politicFileName):
# print "Counting Words"
# print len(allPostsIDsDict)
for keyDict, valueDict in allPostsIDsDict.iteritems() :
tempWordsList = valueDict[0].split()
id = []
for word in tempWordsList:
if not word in avoidWords:
word = word.replace(u'.', u"")
word = word.replace(u'!', u"")
word = word.replace(u'?', u"")
word = word.replace(u'-', u"")
word = word.replace(u',', u"")
word = word.replace(u'"', u"")
word = word.replace(u':', u"")
word = word.replace(u'(', u"")
word = word.replace(u')', u"")
word = word.replace(u'*', u"")
# word = word.replace(u'\'', u"")
wordCountPerPolitics[word] += 1
wordIdPerPolitics[word].append(keyDict)
dateDictForThreeLetter[word] = valueDict[1]
politicsDictWords[politicFileName] = wordCountPerPolitics
politicsDictId[politicFileName] = wordIdPerPolitics
for politic, dictOfWords in politicsDictWords.iteritems() :
# print politic
tempJson = {politicFileName:[],}
for word, count in dictOfWords.iteritems() :
# print word[1]
tempJson[politicFileName].append({word : count})
if wordIdPerPolitics.has_key(word):
if dateDictForThreeLetter.has_key(word):
splitAllWordsByThreeLetters(threeLetterList, word, count, wordIdPerPolitics[word], politic, threeLetterDict, wordIdPerPolitics[word], dateDictForThreeLetter[word])
tempJson[politicFileName].append(wordIdPerPolitics[word])
# printToJsonTempDict(threeLetterDict, politicFileName)
with open(os.path.join(rootPath,allWordsPath, politicFileName + '.txt'), 'w') as outfile:
json.dump(tempJson, outfile, indent=4)
print "finished counting " + politicFileName
wordIdPerPolitics.clear()
politicsDictWords.clear()
dictOfWords.clear()
dateDictForThreeLetter.clear()
allPostsIDsDict.clear()
def splitAllWordsByThreeLetters(listOfThreeLetters, word, count, wordIds, folderName, tempDictJson, ids, date): try: tempDictJson[word].append([{"Word" : word ,"Amount" : str(count),"date" : date.split('+')[0], "Array_id" : ids}]) except: print "Unexpected error:" , sys.exc_info()[0]
def printToJsonTempDict(tempDictJson, folderName): for keyDict, valueDict in tempDictJson.iteritems() :
newpath = os.path.join(rootPath,threeLetterWordsPath, folderName)
if not os.path.exists(newpath): os.makedirs(newpath)
with open(os.path.join(rootPath,threeLetterWordsPath,folderName, keyDict + '.txt'), 'a') as outfile:
json.dump(tempDictJson[keyDict], outfile, indent=4)
tempDictJson.clear()
tempNewWordsDict = {}
politicsArr = [line.strip() for line in open(os.path.join(os.path.dirname(os.path.abspath(file)), 'potilics.txt'))]
for politic in politicsArr: url = "https://graph.facebook.com/" + politic + "?fields=posts.limit(200)%7Bmessage%7D&access_token=1719315164960950%7CnKFpk2SebwixsCQS3y7zQDPA1Ow" createBuffer_Posts(url) AllwordCounter(politic)
for key , word in threeLetterDict.iteritems() :
# print word[0][0]['date']
# print word[0][0]['Word'][:3]
# print word[0][0]['Array_id']
path = os.path.join(rootPath,threeLetterWordsPath, politic, word[0][0]['Word'][:3] + ".txt")
if os.path.exists(path):
with open(path, "r") as jsonFile:
jsonToUpdate = json.load(jsonFile)
found = False
try :
for index in range(0, len(jsonToUpdate)):
# print word[0][0]['Word']
if word[0][0]['Word'] == jsonToUpdate[index][0]["Word"]:
found = True
if word[0][0]['date'] > jsonToUpdate[index][0]["date"]:
jsonToUpdate[index][0]["date"] = word[0][0]['date']
jsonToUpdate[index][0]["Amount"] = int(jsonToUpdate[index][0]["Amount"]) + int(word[0][0]['Amount'])
jsonToUpdate[index][0]["Array_id"] = jsonToUpdate[index][0]["Array_id"] + word[0][0]["Array_id"]
with open(path, 'w') as UpdateJson:
UpdateJson.write(json.dumps(jsonToUpdate))
if found == False:
# print word[0][0]['Word'], word[0][0]["date"], index ,politic , int(word[0][0]['Amount'])
tempNewWordsDict[word[0][0]['Word']] = word[0][0]["date"] ,int(word[0][0]['Amount']), path , word[0][0]['Array_id']
except:
print "no parent data" , politic
for key , word in tempNewWordsDict.iteritems():
# print key
# print word[1]
# print word[2]
with open(word[2], 'r') as jsonToUpdate:
json_data = json.load(jsonToUpdate)
# json_data[0].append([{"Word" : key ,"Amount" : str(word[1]),"date" : word[0].split('+')[0]}])
json_data.append([{"Word" : key ,"Amount" : str(word[1]),"date" : word[0].split('+')[0] , "Array_id": word[3]}])
with open(word[2], 'w') as f:
f.write(json.dumps(json_data))
allPostsIDsDict.clear()
threeLetterDict.clear()
tempNewWordsDict.clear()
print "DONE"
import urllib, json import collections from collections import Counter import operator import sys, os import codecs author = 'AlexGruber'
word = "סתם"; alpha = u"א", u"ב", u"ג", u"ד", u"ה", u"ו", u"ז", u"ח", u"ט", u"י", u"כ", u"ל", u"מ", u"נ", u"ס", u"ע", u"פ", u"צ", u"ק", u"ר", u"ש", u"ת", u"ץ", u"ך", u"ם", u"ן" path = "testFolder" mergedJspnPath= "pythonfiles" rootPath = "c:/"
poli = [line.strip() for line in open(os.path.join(os.path.dirname(os.path.abspath(file)), 'potilics.txt'))]
first = True
for num in range(0,22): for num2 in range(0,22): for num3 in range(0, 26): word="" word = alpha[num] + alpha[num2] + alpha[num3] if first: word = "2" first=False
newpath = os.path.join(rootPath, mergedJspnPath, word + ".txt")
print newpath
fileOpenWrite = open(newpath,'a')
count = 0
fileOpenWrite.write("[")
for i in range(0, len(poli)):
temppath = os.path.join(rootPath,path + poli[i], word + ".txt")
if os.path.exists(temppath):
if count == 0 :
fileOpenWrite.write("{\"id\":\"" + poli[i] + "\",\"resarr\":")
else:
fileOpenWrite.write(",{\"id\":\"" + poli[i] + "\",\"resarr\":")
count += 1
fileREAD = codecs.open(temppath, "r", "utf-8")
foundbug = False
for line in fileREAD.readlines():
for j in range(0, len(line)):
if j < len(line) - 1:
if line[j] == ']' and line[j+1] == '[':
foundbug = True
break
if(foundbug):
fileOpenWrite.write("]")
break
fileOpenWrite.write(line)
fileOpenWrite.write('}')
fileOpenWrite.write(']')
print "DONE"
import urllib, json import collections from collections import Counter import operator import sys, os import codecs author = 'AlexGruber'
perDateCountPath = 'WholeDates' allWordsPath = 'wordAllCount' threeLetterWordsPath = 'testFolder' listOfCheckedWordsPath = "WCheckedList" rootPath = "c:/"
politicsFile = '\ThreeLetterWords.txt'
wordCountPerPolitics = collections.defaultdict(int) wordIdPerPolitics = collections.defaultdict(list) politicsDictWords = collections.defaultdict(dict) politicsDictId = collections.defaultdict(dict) dateDict = collections.OrderedDict() dateMutliArray = []
list_NextUrls = list() buffer_list = list() buffer_string = "" allPostsIDsDict ={} wordIdDict = collections.defaultdict(list) politicDictWordsCount={} politicDictMilitaryCount = {} politicDictSahbekimCount = {} politicDictNarcistCount = {} politicDictKalkalaCount = {} politicDictRahlanCount = {} tempdict={} errorCount = 0
testlist = list() threeLetterList = list()
stuctList = list()
threeLetterDict = collections.defaultdict(list)
dateDictForThreeLetter = {}
avoidWords = [u'ישראל', u'כל', u'את', u'של', u'זה', u'על', u'או', u'גם', u'אז', u'רק', 'and', 'the', 'of', u'עד', u'אשר', u'כי', u'אם', \ 'in', 'to', 'a', 'that' ,'is', 'for', 'with', 'are', 'this', 'have', 'The', 'on', u'-', 'it', 'from', 'a' , \ 'at', 'as', u'היא', u'אני', u'לא', u'עם', u'הוא', u'•', u'(', u')', u' ', u'', u"" , "", " " \ u':', u'"', u',' ,u'–', u'?', u'!', u'.', u'', u'', u'']
militaryWords = u'צבא' , u'צה״ל', u'רמטכ״ל', u'נשק',u'אירן', u'אטום', u'ביטחון' ,\ u'עזה' , u'ג׳האד', u'מלחמה', u'מבצע', u'סכסוך', u'איראני', u'הביטחון', u'חיזבאללה', u'דאעש', u'קבינט', \ u'חמאס' , u'ג׳איסלאמית', u'טיל', u'קסאמים' , u'ג׳קסאם', u'ברזל'
sahbekimWords = u'אנחנו', u'יחד', u'ביחד', u'שלנו', u'כולנו', u'רובנו', u'רוב', u'עם', u'קבוצה', \ u'שיתוף', u'שותף', u'צוות'
narcistWords = u'אני', 'me', 'I'
kalkalaWords = u'כסף', u'כלכלה', u'עושר', u'עוני', u'העוני', u'קו העוני', u'תקציב', u'בורסה', u'מע״מ', u'מד״ד',u'מדד', \ u'העליון', u'העליון', u'מעמד', u'הביניים' ,u'משכורות', u'שכר', u'דירות', u'דיור', \ u'נדל״ן', u'קניה', u'מוצרים', u'מילקי', u'קוטג׳', u'ברלין' , u'תקציבים', u'מסים', u'מיסים', u'הכנסה'\ u'הכנסות', u'חשבת', u'החשבת',u'תוצר', u'יצוא', u'יבוא',u'תוצר', u'ייצוא', u'ייבוא'
avoidWordsRahlan = u'עו"ד', u'אנשים', u'ועשיה', u'איש', u'לבית', u'שאן', u'חזק', u'עיר', \ u'ראש', u'עיריית', u'ראשון', u'לציון', u'רצון', u'שאן', u'חזק', u'עיר'
def getTargetIds(jsonData, list): response = urllib.urlopen(jsonData) data = json.loads(response.read())
try:
if 'next' not in data["paging"]:
raise ValueError("No data for target")
raise SystemExit(0)
else:
# print data["paging"]["next"]
list.append(data["paging"]["next"])
getTargetIds(data["paging"]["next"] , list)
except:
print "END!!!!"
def createBuffer_Next(url): list = "" tempDate="" response = urllib.urlopen(url) data = json.loads(response.read()) try : parent = data["data"] for index in range(len(parent)): try: dateDict[parent[index]["created_time"]] = [parent[index]["message"] , parent[index]["id"]] allPostsIDsDict[parent[index]["id"]] = [parent[index]["message"], parent[index]["created_time"]] tempdict[parent[index]["created_time"]] = parent[index]["message"] list += parent[index]["message"] except:
errorCount=1
except:
print "no parent data"
dateDict['6-6-6T6-6-6'] = ['end', 'end']
return list
def createBuffer_Posts(url): list = "" tempDate="" response = urllib.urlopen(url) data = json.loads(response.read()) try : parent = data["posts"]["data"] for index in range(len(parent)): try: dateDict[parent[index]["created_time"]] = [parent[index]["message"] , parent[index]["id"]] allPostsIDsDict[parent[index]["id"]] = [parent[index]["message"], parent[index]["created_time"]] tempdict[parent[index]["created_time"]] = parent[index]["message"] list += parent[index]["message"] except: errorCount=1 except: print "no parent data"
return list
def perDateCount(politicFileName):
tempDate = ""
tempBuildString=list()
tempIdList = list()
runOnce = True
obj = []
print "Counting Words"
print len(allPostsIDsDict)
tempJson = {"Dates" :{}}
for date, msg in dateDict.iteritems() :
print date
Constdate = date.split("T")[0].split('-')[0] + ":" + date.split("T")[0].split('-')[1]
if runOnce == True:
tempDate = Constdate
runOnce = False
if tempDate == Constdate:
try:
tempWordsList = msg[0].split()
except:
errorCount = 1
for word in tempWordsList:
# if not word in avoidWords:
# word = word.replace(u'.', u"")
# word = word.replace(u'!', u"")
# word = word.replace(u'?', u"")
# word = word.replace(u'-', u"")
# word = word.replace(u',', u"")
# # word = word.replace(u'"', u"")
# word = word.replace(u':', u"")
# word = word.replace(u'(', u"")
# word = word.replace(u')', u"")
# word = word.replace(u'*', u"")
if not word in avoidWords:
# print "accpted ", word
try:
wordIdDict[word].append(msg[1])
tempBuildString.append(word)
except:
errorCount = 1
# else:
# print "not accpted ", word
else:
# print tempDate
JsonTempDate = tempDate
tempDate = Constdate
mostCommon = Counter(tempBuildString).most_common(10)
printHeb = dict(mostCommon)
sorted_x = sorted(printHeb.items(), key = operator.itemgetter(1), reverse=True)
for key, value in sorted_x:
if wordIdDict.has_key(key):
obj.append({"Word" : key ,"Amount" : str(value), "array_id" : list(set(wordIdDict[key]))})
tempJson["Dates"][JsonTempDate]=[obj]
obj=[]
del tempBuildString[:]
del tempIdList[:]
del tempWordsList[:]
wordIdDict.clear()
# with open(perDateCountPath + '\\' + politicFileName + '.txt', 'w') as outfile:
with open(os.path.join(rootPath, perDateCountPath, politicFileName + '.txt') , 'w') as outfile:
json.dump(tempJson, outfile, indent=4)
tempJson.clear()
dateDict.clear()
del tempBuildString[:]
del tempIdList[:]
del tempWordsList[:]
wordIdDict.clear()
print "finished counting " + politicFileName
def AllwordCounter(wordsList, politicFileName):
print "Counting Words"
print len(allPostsIDsDict)
for keyDict, valueDict in allPostsIDsDict.iteritems() :
tempWordsList = valueDict[0].split()
id = []
for word in tempWordsList:
if not word in avoidWords:
# word = word.replace(u'.', u"")
# word = word.replace(u'!', u"")
# word = word.replace(u'?', u"")
# word = word.replace(u'-', u"")
# word = word.replace(u',', u"")
# word = word.replace(u'"', u"")
# word = word.replace(u':', u"")
# word = word.replace(u'(', u"")
# word = word.replace(u')', u"")
# word = word.replace(u'*', u"")
# word = word.replace(u'\'', u"")
wordCountPerPolitics[word] += 1
wordIdPerPolitics[word].append(keyDict)
dateDictForThreeLetter[word] = valueDict[1]
politicsDictWords[politicFileName] = wordCountPerPolitics
politicsDictId[politicFileName] = wordIdPerPolitics
for politic, dictOfWords in politicsDictWords.iteritems() :
print politic
tempJson = {politicFileName:[],}
for word, count in dictOfWords.iteritems() :
# print word[1]
tempJson[politicFileName].append({word : count})
if wordIdPerPolitics.has_key(word):
if dateDictForThreeLetter.has_key(word):
splitAllWordsByThreeLetters(threeLetterList, word, count, wordIdPerPolitics[word], politic, threeLetterDict, wordIdPerPolitics[word], dateDictForThreeLetter[word])
tempJson[politicFileName].append(wordIdPerPolitics[word])
printToJsonTempDict(threeLetterDict, politicFileName)
# with open(allWordsPath + '\\' + politicFileName + '.txt', 'w') as outfile:
with open(os.path.join(rootPath, allWordsPath, politicFileName + '.txt'), 'w') as outfile:
json.dump(tempJson, outfile, indent=4)
print "finished counting " + politicFileName
wordIdPerPolitics.clear()
politicsDictWords.clear()
dictOfWords.clear()
dateDictForThreeLetter.clear()
allPostsIDsDict.clear()
def CheckWordsAgainstThelist(wordsList, politicFileName):
countMilitaryWords = 0
listWordsMilitaryWords = []
countSahbekimWords = 0
listWordsSahbekimWords = []
countNacistWords = 0
listWordsNacistWords = []
countKalkalaWords = 0
listWordsKalkalaWords = []
countRahlanWords = 0
listWordsRahlanWords = []
print "keys number in the tempDict in CheckWordsAgainstTheList : " , len(tempdict.keys())
for keyDict, valueDict in tempdict.iteritems() :
tempWordsList = valueDict.split()
for checkWord in tempWordsList:
if not checkWord in avoidWords:
checkWord = checkWord.replace(u'.', u"")
checkWord = checkWord.replace(u'!', u"")
checkWord = checkWord.replace(u'?', u"")
checkWord = checkWord.replace(u'-', u"")
checkWord = checkWord.replace(u',', u"")
checkWord = checkWord.replace(u'"', u"")
checkWord = checkWord.replace(u':', u"")
checkWord = checkWord.replace(u'(', u"")
checkWord = checkWord.replace(u')', u"")
checkWord = checkWord.replace(u'*', u"")
if checkWord in militaryWords:
countMilitaryWords+=1
listWordsMilitaryWords.append(checkWord)
if checkWord in sahbekimWords:
countSahbekimWords+=1
listWordsSahbekimWords.append(checkWord)
if checkWord in narcistWords:
countNacistWords+=1
listWordsNacistWords.append(checkWord)
if checkWord in kalkalaWords:
countKalkalaWords+=1
listWordsKalkalaWords.append(checkWord)
if checkWord in politicByWordsGut:
if not checkWord in politicByWords:
countRahlanWords+=1
listWordsRahlanWords.append(checkWord)
# print checkWord, len(politicByWords)
politicDictMilitaryCount[politicFileName] = {"amount" : countMilitaryWords/float(politicDictWordsCount[politicFileName]), "wordsArray" : Counter(listWordsMilitaryWords).most_common(3)}
politicDictSahbekimCount[politicFileName] = {"amount" : countSahbekimWords/float(politicDictWordsCount[politicFileName]), "wordsArray" : Counter(listWordsSahbekimWords).most_common(3)}
politicDictNarcistCount[politicFileName] = {"amount" : countNacistWords/float(politicDictWordsCount[politicFileName]), "wordsArray" : Counter(listWordsNacistWords).most_common(3)}
politicDictKalkalaCount[politicFileName] = {"amount" : countKalkalaWords/float(politicDictWordsCount[politicFileName]), "wordsArray" : Counter(listWordsKalkalaWords).most_common(3)}
politicDictRahlanCount[politicFileName] = {"amount" : countRahlanWords, "wordsArray" : Counter(listWordsRahlanWords).most_common(3)}
print politicFileName, countMilitaryWords, countMilitaryWords/float(politicDictWordsCount[politicFileName])
del tempWordsList[:]
tempdict.clear()
del listWordsMilitaryWords[:]
del listWordsSahbekimWords[:]
del listWordsNacistWords[:]
del listWordsKalkalaWords[:]
del listWordsRahlanWords[:]
def splitAllWordsByThreeLetters(listOfThreeLetters, word, count, wordIds, folderName, tempDictJson, ids, date): if len(word) > 2 : try: if word[:3] in listOfThreeLetters: tempDictJson[word[:3]].append([{"Word" : word ,"Amount" : str(count),"date" : date.split('+')[0], "Array_id" : ids}]) except: print "Unexpected error:" , sys.exc_info()[0] else: tempDictJson["2"].append([{"Word" : word ,"Amount" : str(count),"date" : date.split('+')[0], "Array_id" : ids}])
def printToJsonTempDict(tempDictJson, folderName): for keyDict, valueDict in tempDictJson.iteritems() :
newpath = os.path.join(rootPath,threeLetterWordsPath,folderName)
if not os.path.exists(newpath): os.makedirs(newpath)
with open(os.path.join(rootPath, threeLetterWordsPath, folderName, keyDict + '.txt'), 'a') as outfile:
json.dump(tempDictJson[keyDict], outfile, indent=4)
tempDictJson.clear()
def sortAndPrintMilitaryCount():
sorted_politicDictMilitaryCount = sorted(politicDictMilitaryCount.items(), key=operator.itemgetter(1), reverse=True)
with open(os.path.join(rootPath,listOfCheckedWordsPath, "militaryCountRecords_1.txt"), 'a') as outfile:
json.dump(sorted_politicDictMilitaryCount, outfile,indent=4)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictMilitaryCount[index][0]] = sorted_politicDictMilitaryCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath, "militaryCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
def sortAndPrintSahbekimCount(): sorted_politicDictSahbekimCount = sorted(politicDictSahbekimCount.items(), key=operator.itemgetter(1), reverse=True)
with open(os.path.join(rootPath,listOfCheckedWordsPath,"SahbekimCountRecords_1.txt"), 'a') as outfile:
json.dump(sorted_politicDictSahbekimCount, outfile,indent=4)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictSahbekimCount[index][0]] = sorted_politicDictSahbekimCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath,"SahbekimCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
def sortAndPrintNarcistCount(): sorted_politicDictNarcistCount = sorted(politicDictNarcistCount.items(), key=operator.itemgetter(1), reverse=True)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictNarcistCount[index][0]] = sorted_politicDictNarcistCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath, "narcistCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
def sortAndPrintKalkalaCount(): sorted_politicDictKalkalaCount = sorted(politicDictKalkalaCount.items(), key=operator.itemgetter(1), reverse=True)
with open(os.path.join(rootPath,listOfCheckedWordsPath,"kalkalaCountRecords_1.txt"), 'a') as outfile:
json.dump(sorted_politicDictKalkalaCount, outfile,indent=4)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictKalkalaCount[index][0]] = sorted_politicDictKalkalaCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath, "kalkalaCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
def sortAndPrintWordCount(): sorted_politicDictWordsCount = sorted(politicDictWordsCount.items(), key=operator.itemgetter(1), reverse=True)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictWordsCount[index][0]] = sorted_politicDictWordsCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath, "wordsCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
def sortAndPrintRahlanCount(): sorted_politicDictRahlanCount = sorted(politicDictRahlanCount.items(), key=operator.itemgetter(1), reverse=True)
tempPrintJson ={}
for index in range(3):
tempPrintJson[sorted_politicDictRahlanCount[index][0]] = sorted_politicDictRahlanCount[index][1]
with open(os.path.join(rootPath,listOfCheckedWordsPath, "rahlanCountRecords.txt"), 'a') as outfile:
json.dump(sorted(tempPrintJson.items(), key=operator.itemgetter(1), reverse=True), outfile,indent=4)
print os.path.dirname(os.path.abspath(file)) + '\potilics.txt' politicByWords=[] politicCheckListArray = [] gutlist = [] politicByWordsGut = []
politicsArr = [line.strip() for line in open(os.path.join(os.path.dirname(os.path.abspath(file)), 'potilics.txt'))] threeLetterWords = [line.strip() for line in open(os.path.join(os.path.dirname(os.path.abspath(file)), 'ThreeLetterWords.txt'))]
f = codecs.open(os.path.join(os.path.dirname(os.path.abspath(file)), 'ThreeLetterWords.txt'), "r", "utf-8") p = codecs.open(os.path.join(os.path.dirname(os.path.abspath(file)), 'politicsName.txt'), "r", "utf-8")
for words in f.readlines(): testlist.append(words.split())
for word in testlist: for cut in word: threeLetterList.append(cut)
for words in p.readlines(): gutlist.append(words.split())
for word in gutlist: for cut in word:
politicByWordsGut.append(cut)
for politic in politicsArr: url = "https://graph.facebook.com/" + politic + "?fields=posts.limit(200)%7Bmessage%7D&access_token=1719315164960950%7CnKFpk2SebwixsCQS3y7zQDPA1Ow" response = urllib.urlopen(url) data = json.loads(response.read())
url2 = "https://graph.facebook.com/" + politic + "?fields=name"
response2 = urllib.urlopen(url2)
data2 = json.loads(response2.read())
try :
try:
politicCheckListArray.append(data2["name"].split())
for words in politicCheckListArray:
for word in words:
politicByWords.append(word)
# print word, len(politicByWords)
except:
# errorCount=
print "Unexpected error:" , sys.exc_info()[0]
print data["posts"]["paging"]["next"]
allPostsIDsDict ={}
buildString = createBuffer_Posts(url)
getTargetIds(data["posts"]["paging"]["next"], list_NextUrls)
for url in list_NextUrls:
buildString += createBuffer_Next(url)
print "ALL ID's in the allPostIDsDict : " , len(allPostsIDsDict.keys())
politicDictWordsCount[politic] = len(buildString)
AllwordCounter(buildString, politic)
perDateCount(politic)
CheckWordsAgainstThelist(buildString, politic)
politicCheckListArray=[]
politicByWords=[]
del list_NextUrls[:]
del buildString[:]
except:
print politic
print "Unexpected error:" , sys.exc_info()[0]
sortAndPrintMilitaryCount() sortAndPrintSahbekimCount() sortAndPrintNarcistCount() sortAndPrintKalkalaCount() sortAndPrintWordCount() sortAndPrintRahlanCount()
print 'done'
GideonSaarLikud DanonDanny AvigdorLiberman 137409609701165 IsaacHerzog GermanYeshAtid 281431865311442 DeryArye tamarzandberg 118410851589072 TzipiHotovely tzipilivni 154570404606299 MFeiglin Netanyahu 207139259326193 612784452070209 MichaeliMerav OfirAkunis YairLapid Moshekahalon 237683826350051 NaftaliBennett 201172239926506 meircoh 173196886046831 zehavagalon katzisraellikud ShellyYachimovich 102997256411052 207139259326193 YuliEdelstein 341448679281672 297207456997968 steinitzyuval50 142479632494944 OfirAkunis 422665717788216 348254355276420 jakilevi2013 davidamsalemjerusalem 404314496289809 aamarhamad 457794627605948 156632191145120 620206311338749 MichaelsonSonLion 174411199282819 402936269773132 350522538451081 377452172353770 102155496619137 erelmargalit boaztoporovsky 520086018021432 MKHasson 212547315512949 224562257709477 zohirbhlol MichalBiran Trajtenberg PninaTamano 129931940492158 1578176972413229 371426819606751 sbyifat SternElazar LevyYeshAtid NachmanShai YossiYonah 348254355276420 SvetlovaKsenia 394242203948130 OferYeshAtid dovhanin KMZeevElkin AymanOdeh1975 EllibenDahan DabushMeretz Officialbaruchmarzel 450583275090800 258598010822055 318718848205174 YoavGallant EliElalouf 1000293466651925 632771873441351 625040900863570 zohirbhlol AmbassadorOren avihabait 389464081230676 ForerOded 639391266187801 NissanSlomiansky 100000162448100 438831352866459 mottiyogev avihabait orbachnir rontzkiavi oritstrook MKIlanGilon 617957021581571 sharongal100 166156570202888 829660960390410 442386732471209 195437940588631
send them to me in a separate email..... for some reason i can't open them when you send them through github just send a new email to me with the scripts
@akariv @mushon
hey, we've uploaded new scripts (ThreeLetterListGather.py, MainScript.py, potilics.txt, realTimeUpdate.py) please delete all the current data and rerun the scripts (i need you to delete the old data because of unwanted data that needs to be removed from the server).
additional I've updated the the website code. so please re upload the files index.html, style_new_new.css, Visualization.js, Top3.js.
fixed! :smiley:
@AlexGr2
the script ThreeLetterListGather.py didn't create the 2.txt file that holds all the words with 2 or less letters. please fix this.