joelansbro / pipeline

API Pipeline DB middleware
2 stars 0 forks source link

Adjust cleanjob.py #20

Closed jagithub2 closed 2 years ago

jagithub2 commented 2 years ago

This branch does the following:

cleanupjob.py Cleans up some of the html from within the contents file

keywordjob.py runs basic keyword parsing on the contents file, to return a string of keywords that is sent alongside the article to the database.

output

From within maindb.sqlite at the moment, I ran several python related articles through, and print this as the keywords:

┌─────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                title                                │                                                                                                                                                         keywords                                                                                                                                                          │
├─────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Append List to CSV File in Python                                   │ dictionary,operations,basics,function.after,discuss,order,steps.first,instruction,contains,save,csv.writer(myfile)writer.writerow(mylist)myfile.close()myfile,different,function,world,language1,aditya,real,opened,know,literal,approach                                                                                 │
│ Improve Your Python With Python Tricks – Real Python                │ myfunc(**dict_vec)1,add(x,fantastic,good,y,statements&quot,emails,you’ll,function,y}>>&gt,functions,can't,intuitive,real,curated,you've,simple,teaches,x,def                                                                                                                                       │
│ Primer on Jinja Templating – Real Python                            │ .render,high_score,adjust,dictionary,course,open(results_filename,changing,enhance,light_or_dark_mode(element,that’ll,message.otherwise,you’ll,it’ll,macros.nav_link(menu_item,tell,throw,function,lt;ul>13,listif,students|sort(attribute="name&quot                                        │
│ Python built-in functions to know                                   │ efficient,callable,apos;index&apos,dictionary,checking.these,apos;__getattribute__&apos,local,int(3,constructors,zip_longest(numeral,print(words)['welcome&apos,numberdivmod,printlenstrintfloatlisttupledictsetrange,you’ll,it’ll,equivalent,deletes,tell,function,bool('')false>>&gt │
│ Application Performance Monitoring AWS Lambda Functions with Sentry │ efficient,lambdafunction,local,setup,details,function,occur,world,greensection,relies,process.select,usethe,building,setup&quot,osimport,button":in,screenshot.looks,recordings,identitynameparameterand,quot;browse                                                                                                 │
│ From Python to Numpy                                                │ goal,algorithmto,course,n_density,discounted,wellcompute,numpy,placements,grid,vectors,vectorization,upper,maze[row][col,heidelberg,won't,function,valueand,dynamics,np.multiply(g,we'll                                                                                                                        │
│ The State Of Python In 2021                                         │ course,illustrious,cache,guaranteed,brought,numpy,computers.$,ansible,awaited,equivalent,swig,stuff,javascript’s,application.python,videos,restructuredtext,attention,facepalm,real,pyyaml                                                                                                                         │
│ Why We Switched from Python to Go                                   │ activity,course,understoodthe,scale,innovative,x201c;defer&#x201d,is elixir,called virtualgo which,builds,you’ll,djrf,mature,function,switched stream’s primary,leave,attention,the errors,goroutines,fun,considered                                                          │
│ Understanding all of Python, through its builtins                   │ quot;higher,lag,logical,inheritence,hex(ord('🐍'))'0x1f40d'>>&gt,int.what,print(f'{index,quot;bytecode&quot,apos;hasattr&apos,all(x,x=5>>&gt,backed,quot;duck,l.e.g.b,it.we,real,refer,assigned,note,dir:>>&gt                                                          │
└─────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

So not a bad run so far - I've also left some notes on the database pass through, constantly opening and closing the connection makes me wonder whether to keep it open for a while and then close it