kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

PyCharm Debugging #3601

Closed lordsoffallen closed 8 months ago

lordsoffallen commented 8 months ago

Description

I have followed the instructions here: https://docs.kedro.org/en/stable/development/set_up_pycharm.html and this works when I don't have a custom dataset. As soon as I have one which points to lets say extras.api.NewDataset (extras is under src folder), I can't use pycharm to run or debug but terminal works. Any ideas why this is so and how to fix it?

Context

Steps to Reproduce

Add a custom dataset to project and try to run debug/run in pycharm.

Expected Result

Actual Result

DatasetError: An exception occurred when parsing config for dataset 'laws#api': Class 'extras.api.NewAPIDataset' not found, is this a typo?

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

datajoely commented 8 months ago

how are you defining api.APIDataset in your project? I'm not sure you should need the extras in this classpath.

lordsoffallen commented 8 months ago

how are you defining api.APIDataset in your project? I'm not sure you should need the extras in this classpath.

You mean where the file is located at?

Structure is as follows: src: extras/api.py -> APIDataset my_project/... -> kedro and project code.

I inherit the AbstractAPI class from kedro. The problem is that this guy works from cli so when I do kedro run it just works but in pycharm somehow doesn't like the new custom dataset.

I have tried different combinations but no luck so far

datajoely commented 8 months ago

Oh it's a custom one - I thought you were using the 1st party dataset.

This I usually test this in a Jupyter/IPython session and seeing if you can import it as a Python class. You may find there is a missing __init__.py somewhere. Behind the scenes we're just using importlib to do the same thing.

lordsoffallen commented 8 months ago

Yes, it is a custom one as I need to scrap the API, not a single API call.

Ipython session works ( I hooked that into python console in pycharm). I suspect there might be something with sys.path as in console there is init script that runs in pycharm:

import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/ftopal/Projects/law-buddy', '/home/ftopal/Projects/law-buddy/src'])
from kedro.ipython import load_ipython_extension
load_ipython_extension(get_ipython())

I see that PyCharm is supposed to enable adding source and project to pythonpath but i got a feeling that is broken somehow..

lordsoffallen commented 8 months ago

Anyone else can possibly test this if they have pycharm to see if it's happening to them as well?

datajoely commented 8 months ago

I'm now a VSCode guy - but I suspect something in these settings will be the trick

image

noklam commented 8 months ago

Can you show your configuration? It works fine for me Pycharm 2023.3.

If you are running pytest you need to make sure --no-cov is used because it interacts with the debugger in a weird way, but I guess you are not running test.

lordsoffallen commented 8 months ago

image

I have same configuration except Emulate terminal in output console which I removed intentially as I can't execute anything when I am debugging when that's on.

I am also using 2023.3.3 :(

You basically had a custom dataset referred in the catalog and it worked while debugging?

noklam commented 8 months ago

I don't think custom dataset matters, since in your case it is "not found", so it shouldn't even get loaded. do you get the same behavior if you defined an arbitrary dataset?

lordsoffallen commented 8 months ago

Kedro default datasets work and as I said everything works fine in a terminal but somehow path resolution is not working when debugging. When I have no custom dataset, everything works as expected.

I debug the kedros core function and here is what happens:

Last one should somehow work in conjuction with appened sys.path but it doesnt.

I ran the sys path (while debugging), its output as follows:

['',
 '/snap/pycharm-community/364/plugins/python-ce/helpers/pydev',
 '/snap/pycharm-community/364/plugins/python-ce/helpers/third_party/thriftpy',
 '/snap/pycharm-community/364/plugins/python-ce/helpers/pydev',
 '/home/ftopal/Projects/law-buddy',
 '/home/ftopal/Projects/law-buddy/src',
 '/home/ftopal/.cache/JetBrains/PyCharmCE2023.3/cythonExtensions',
 '/home/ftopal/miniconda3/envs/transformers/lib/python310.zip',
 '/home/ftopal/miniconda3/envs/transformers/lib/python3.10',
 '/home/ftopal/miniconda3/envs/transformers/lib/python3.10/lib-dynload',
 '/home/ftopal/miniconda3/envs/transformers/lib/python3.10/site-packages']
noklam commented 8 months ago

If terminals work but PyCharm doesn't, you can check for a few things:

Does this work for you? Assuming you in the root directory of your Kedro project

kedro ipython
%load_ext kedro.ipython # may not needed if the project is loaded correctly already

from .extras.api import GermanLawAPIDataset

For a typical structure, everything should be INSIDE your kedro project.

Structure is as follows: src: extras/api.py -> APIDataset my_project/... -> kedro and project code.

this doesn't look correct for me, it should be:

--src/
-----my_project/
----------------/extras/api.py

Last but not least, if the alias doesn't work, you can always use the full path.

Assuming the dataset is importable from import my_project.extras.api.XXXDataset, then you can copy this full path into type as:

some_dataset:
  type: my_project.extras.api.XXXDataset
lordsoffallen commented 8 months ago

Oh god, for some reason the run configurations were using a different environment. :facepalm: I fixed that and it works now. My project was using correct environment but somehow run configurations wasn't.

With this finding and finally working debugging, I am closing the issue. Thanks so much for the help! :)

noklam commented 8 months ago

Glad you find a solution 😁