New node method to report where nodes come from

sergisiso commented 1 year ago

During my recent meetings I received more than once the suggestion that we could improve the error messages, I think the problem is that we often just print the node (view()) causing the error but it difficult to know where it comes from.

In #1587 I proposed the Node.debug_string() to improve some error output, but in addition we could also introduce a Node.origin_string() which which could:

for nodes that come from a source file and have a _ast link print:
```
Node ArrayReference[name:'array1'] from line XX of mysource.f90:
> array(i,j) = scalar + array1(i,j)
                    ^^^^^^ 
```
(Looking at fparser node I found there is the item with the line number and start/end position on this line, so the previous message should be possible. However, I have not found the filename on the fparser node, @arporter @rupertford do you know if the filename is stored somewhere?)
For nodes dynamically created we could use the python inspect capabilities to say where they were created (but I would try to make them user-facing by reporting something in the stack that the user understands) e.g: Node 'ArrayReference[name:'array1']' created in TransformationXXX or Node 'ArrayReference[name:'array1']' created in 'psyclone_script.py' line XX

This is quite a bit of work: getting the information, checking that the _ast links are not broken (they aren't tested), updating error messages in a sensible way... but I think we should start moving on this direction.

@rupertford @arporter @hiker @TeranIvy I will appreciate any opinions on this?

arporter commented 1 year ago

If this is something that has been coming up regularly then I think it is a priority. Especially as we try to broaden the user base of PSyclone. What you say all sounds sensible to me. I don't know whether fparser has the filename since that is handled by the reader rather than the parser. However, it probably isn't too hard to add it (he says optimistically).

hiker commented 1 year ago

I totally agree that this would be useful. I vaguely remember seeing that some objects still had a reader object stored?? A quick look shows that the match functions in Fortran2008 typically have a 'reader' argument ... and I didn't verify what this contains :)

sergisiso commented 10 months ago

See a branch with a method to retrieve the original source code line: https://github.com/stfc/PSyclone/tree/2062_origin_string

As you can see a successful retrieval needs to pass:

if self._ast:
    if hasattr(self._ast, 'item'):
        if hasattr(self._ast.item, 'reader'):

For the first condition, we could improve src/psyclone/psyir/frontend/fparser2.py to make sure we populate all the _ast with a relevant fparser node each time we create a PSyIR node.

@hbrunie Can you try if self._ast.item.span is enough for your case? If it isn't let me know which kind of node it is and what step fails.

@arporter @rupertford I am not sure why some fparser nodes do not have an 'item' and some items do not have a 'reader'.

arporter commented 10 months ago

Looking in fparser.two.utils it seems that we only populate an object's item if we still have a reader object. I think this is because, if the match recurses, we only have a string to match against: https://github.com/stfc/fparser/blob/f43a18fdb9b03a70fa0f5b9a04e3e01b12533360/src/fparser/two/utils.py#L402-L446 However, that doesn't explain why some nodes don't even have the attribute!

rupertford commented 10 months ago

The reader and use of 'item' only go down to the statement level. The reader is statement based so this makes sense. Below this, strings are passed and 'content' is used instead of 'item'. I don't know why the original author used 'content' instead of 'item'. I think we added a helper routine that returns whichever of item and content is set to hide the implementation details but I forget its name.

arporter commented 10 months ago

I think you're confusing items with item there Rupert. item seems to be a separate thing.

hbrunie commented 10 months ago

@sergisiso In my case the function keeps failing on each node. and I get the Reference from line unknown 55 times. I also had to update the function with another condition because I got this error in one of the cases:

 filename = self._ast.item.reader.file.name
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'FortranStringReader' object has no attribute 'file'

My code is just doing this:

def launch_test_origin_string():
    reader = FortranReader()
    node = reader.psyir_from_file("/path/to/some/dummy.F90")
    tosee = [node.children[0]]
    while tosee:
        child = tosee.pop(0)
        print(type(child))
        s =child.origin_string()
        print(s)
        tosee += child.children

Here is the dummy.F90 code:

subroutine test(cff1, cff2, cff3, vrhs, vbar, dvom, drhs, om_v)
    REAL      :: vrhs(0:10, 0:20)
    REAL      :: vbar(0:10, 0:20)
    REAL      :: dvom(0:10, 0:20)
    REAL      :: drhs(0:10, 0:20)
    REAL      :: om_v(0:10, 0:20)
    REAL      :: cff1
    REAL      :: cff2
    REAL      :: cff3
    INTEGER*4 :: i
    INTEGER*4 :: j

    do j = 0, 20, 1
        do i = 0, 10, 1
            vrhs(i,j) = cff1 * vbar(i,j) + cff2 * vbar(i,j) + cff3 * vbar(i,j)
            dvom(i,j) = 0.5d0 * (drhs(i,j) + drhs(i,j - 1)) * om_v(i,j) * vrhs(i,j)
        enddo
    enddo
end

Maybe my usage is not correct ? I am sorry if I missed something obvious.

sergisiso commented 10 months ago

@hbrunie your usage is correct. I fixed some issues, if you pull the branch now it should return the line number from everything inside the loop statements.

However, the loops expressions themself still won't work, we will need to fix https://github.com/stfc/PSyclone/issues/2062#issuecomment-1838424768 in fparser.

rupertford commented 10 months ago

I think you're confusing items with item there Rupert. item seems to be a separate thing.

Oops.

hbrunie commented 10 months ago

@sergisiso it does indeed return the line number from a stmt inside the loop. I still don't understand why the file is 'unknown' but right now this is not a priority to have the file name, as we working with 1 or 2 files test cases. Thanks ! :+1:

sergisiso commented 8 months ago

@hbrunie the filename should also be in the string now (for statements). I have created a PR to put this functionality into the master branch even if it doesn't support non-statements yet. We could open another issue for non-statements if these are needed in the future.

stfc / PSyclone

New node method to report where nodes come from #2062