ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

display warning when there are output files in task itself #68

Closed Wxl19980214 closed 2 years ago

Wxl19980214 commented 2 years ago

Describe your changes

Parse notebook code, if it contains statement such as open, output warning.

Issue ticket number and link

Closes #16

Checklist before requesting a review

Wxl19980214 commented 2 years ago

working on it

Wxl19980214 commented 2 years ago

Hi, I am currently stuck on how to distinguish actual code with string. For example:

f = open('something') # code f = "open('something)" # str f = 'Today the weather is good and I opened the door.' # str

Only the first one should we output the warning. But what I am doing now is to check whether the line contains key word 'open', which obviously not going to work. If someone wants to mess with the system, they can also do stuff like this:

f = "Path('fool.txt').write_text()"

Is there any good way to detect the difference?

idomic commented 2 years ago

@Wxl19980214 We can automatically ignore strings or comments, I think we can open an issue if we see it's too complex, I don't see a user writing an open command inside a str in purpose"Path('fool.txt').write_text()" , unless it's commented out #Path('fool.txt').write_text().

Wxl19980214 commented 2 years ago

Me2, I don't expect people doing weird stuff. I think parso automatically ignore comments so we got that off the table. One last thing is maybe people can give a variable named write_text? It looks legit. Do we just ignore this case as well, or are we somehow do stuff like string pattern?

idomic commented 2 years ago

Maybe some regex matching, it's write_text(, I'm sure a bit more sophisticated regex will solve this + most of those problems are probably already solved.

Wxl19980214 commented 2 years ago

TBH if we omit the situation where user mess up with our system. Simple contain would suffice? We could just check if statement like these .write_text( , open(, .to_csv( are included in our code

idomic commented 2 years ago

simple contains will suffice?

I think it does.

@Wxl19980214 is this ready for review?

Wxl19980214 commented 2 years ago

@idomic Yes