Closed gwoodwa1 closed 4 years ago
Full traceback message below:
Traceback (most recent call last):
File "scrapcli_test2.py", line 20, in <module>
result = conn.get_config()
File "/home/gary/genie/lib/python3.6/site-packages/scrapli_netconf/driver/driver.py", line 111, in get_config
response._record_response(raw_response) # pylint: disable=W0212
File "/home/gary/genie/lib/python3.6/site-packages/scrapli_netconf/response.py", line 87, in _record_response
self._record_response_netconf_1_0()
File "/home/gary/genie/lib/python3.6/site-packages/scrapli_netconf/response.py", line 114, in _record_response_netconf_1_0
b'<?xml version="1.0" encoding="UTF-8"?>', b""
File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
Hey @gwoodwa1 !
Thanks for opening this. Would you be able to snag the log as well? If you don't want to paste it here you can slack me on ntc slack or email me. I just tested against an MX running 20.1 code and was able to snag the config w/out issue, so I'm not quite sure where to start figuring this one out, but we'll figure it out and get ya sorted!
Carl
Thank you Carl, really like the look of using this. It was my first time experimenting with it. I've attached the log file. Let me know if I can get any more info to help understand this. Also, I tested the same with the Juniper's PyEZ to make sure it would work and it does with that one. PyEZ equivalent is below for comparision.
from jnpr.junos import Device
from lxml import etree
import jxmlease
username='mike'
password='root123'
dev = Device(host='172.21.0.3', user=username, password=password, normalize=True)
dev.open()
rpc = dev.rpc.get_config()
rpc_xml = etree.tostring(rpc, pretty_print=True, encoding='unicode')
print(rpc_xml)
Thanks @gwoodwa1 will take a peak hopefully this afternoon and keep ya posted!
Hey @gwoodwa1
Any chance you could crank that logging up to DEBUG and post it back?
Whats happening (I think) is that we are maybe sending an extra return somewhere or the device is not waiting for returns and just shipping its response back to us(?) -- we normally read the inputs we send off the channel, and THEN send a return. This means we can always just keep reading until the command is done returning data to us and we know all of the output we read is the response to our input... in this case it looks like we actually send the get config rpc but when we read off the channel we are already getting the response from the device:
INFO:scrapli.channel-172.21.0.3:Read: b'\n<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get-config><source><running/></source></get-config></rpc>\n]]>]]><rpc-reply xmlns:junos="http://xml.juniper.net/junos/20.1R0/junos" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101">
I tested manually against an mx on 20.x and it does not return data until the "enter" is sent after the rpc. I also test this regularly against a vsrx and haven't seen this behavior. So I'm hoping the debug logs will help point us in a direction to keep digging!
edit Oh - also could you confirm the versions of scrapli and scrapli netconf you have installed please? Also it may be worth a try w/ develop branch of scrapli netconf... made a few little changes in there recently -- dont think it will have an impact but cant hurt :)
Carl
Thanks Carl, I've put the logging up to DEBUG and attached the file. Seems to be more in there. scrapli.log
I installed it today directly from github. Is there a way to get the version number? I've just installed the latest commit to be sure of the most up to date version. This has not changed the issue. Thanks
Sorry I missed the question there! Yeah you can check the version number w/ pip list | grep scrapli
(or python3 -m pip list | grep scrapli
where "python3" is whatever you use to run stuff -- ex: python3.8 or python3.6) -- should let us know what scrapli and scrapli-netconf versions are installed. You can try to install develop branch like: pip install -e git+https://github.com/scrapli/scrapli_netconf.git@develop#egg=scrapli_netconf
though I dont believe it'll make it work.
What version of junos is on that vmx, and any chance you have another Junos (or other) device handy to try it out with to see if anything new/different happens? I've found another MX w/ 18.4 and tried against that too for kicks with user auth and w/ key auth and it works as expected there too. I definitely see the problem from the logs but am just struggling to reproduce it so I can figure out what's going on :(
Hi Carl, Thanks for checking this. I am using Junos: 20.1R1.11 VMX. I have access to other versions so I will try this with a newer version and an older one too. I also have some friends with access to physical Juniper equipment and VMXs so I'll try it there too and let you know the results.
Versions (below) scrapli (2020.9.26) scrapli-asyncssh (2020.7.4) scrapli-netconf (2020.9.23)
Thanks for your help on this. I think we'll soon get to the bottom of it once I've tried different versions of VMX. Might take me a day or two to come back with the results :)
Awesome you rock sir!! Yeah, we'll figure it out one way or the other!
You could try it w/ asyncssh to see if that works any differently... In theory this should give us basically the same result... if that is the case then I'm kinda wondering if that vmx is behaving badly, if it works, then I think I have a difficult to track down bug somewhere in the system
transport stuff for netconf (difficult just since I cant reproduce yet). Here is a little example that should work for ya:
import asyncio
import logging
import uvloop
from scrapli_netconf.driver import AsyncNetconfScrape
logging.basicConfig(filename="scrapli.log", level=logging.INFO)
logger = logging.getLogger("scrapli")
my_device = {
"host": "172.21.0.3",
"auth_username": "mike",
"auth_password": "root123",
"auth_strict_key": False,
"port": 830,
"transport": "asyncssh"
}
async def main():
conn = AsyncNetconfScrape(**my_device)
await conn.open()
print(conn.server_capabilities)
config_result = await conn.get_config()
print(config_result.result)
uvloop.install()
asyncio.run(main())
Thanks again for sticking with me on this!
Hey Carl, Thanks buddy..!! This has really grabbed my attention now. I've gone to 18.3R3.8 and used your asyncio script and it works beautifully so long as I change to TCP Port 22 instead of TCP Port 830. Now the interesting thing is I have TCP 830 configured for NETCONF in the JunOS config and there are no firewall filters. I'll do some more testing with the different versions using my old script and the new one you provided. Hopefully it will pinpoint a pattern, it might just be my setup at the end of the day. My friend has tried in on a physical SRX today and all works and he was also going to try it on his VMX later. In summary, I'll do some more testing against old script/new script and different JunOS VMX versions. At the very least it is working using asyncio and TCP Port 22. Thanks again, Gary
Progress! :D
Would be interesting to see/hear if port 22 w/ system transport works ok... The fact that the asyncssh transport plugin works seems to me to indicate that I have a bug somewhere with the system transport, but the fact that it works with my test devices and your friends SRX as well maybe means that VMX is just behaving... strangely? (I hate "strange"... I def wanna know what's happening!!)
In any case, glad we are looking up -- thanks for all the troubleshooting!
Carl
The first proper round of testing on Junos: 18.3R3.8 VMX using TCP Port 22 (TCP 830 non-working but that could be something in the device configuration) original script (i.e. non asyncio) - Capabilties works but get-config operation produces the same traceback error asyncio script - All working for both Capabilities and get-config
On Junos: 20.3R1.8 VMX using TCP Port 22 (TCP 830 non-working but that could be something in the device configuration) original script (i.e. non asyncio) - Capabilties works but get-config operation produces the same traceback error asyncio script - All working for both Capabilities and get-config
The behavior seems to be consistent across different versions of JunOS VMX. I just need to ask a friend if he's tried on a VMX too and what he sees happen. I am using a docker container for the Junos but again I can't see if that would be a factor here.
Thanks for the details Gary -- trying to get my hands on VMX download so I can reproduce and troubleshoot... will keep you posted!
Thanks Carl, I can also try and see if there is a way I can debug properly on the Juniper.
Wow... this was a fun one, but I think I've tracked down what's going on... It is my understanding that an rpc gets sent to the device, and that nothing "happens" until a return character is sent (after sending the rpc). In my testing with Cisco IOSXE and IOSXR, and with Junos on vSRX (and real qfx/mx devices), and in some other folks testing with Nokia devices this has proven to either 1) be true, or 2) has never been a problem due to luck/cosmic rays/whatever.... but on the vmx (I did find an eval copy not behind juniper customer portal login thing) this is not the case!
You can probably test this out yourself pretty easily too:
SSH to the device as scrapli does:
ssh 172.18.0.11 -p 22 -l vrnetlab --s netconf
(with your user of course) -- you'll end up getting greeted with the server capabilities, then it is our turn to send capabilities:
<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
</capabilities>
</hello>]]>]]>
Next we can request the get config rpc:
<?xml version='1.0' encoding='utf-8'?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get-config><source><running/></source></get-config></rpc>
]]>]]>
^ is how scrapli sends the rpc... however it seems that the MX thinks those line breaks means GOGOOGOGO and it starts sending the config straight away... this breaks scrapli basically because we expect to be able to read the input we sent to the device off the channel before we hit return (this also "drains" the inputs we write off the channel so we dont have to deal w/ stripping it out/parsing it/whatever).
If you send the above you should basically see the config returned before you can blink (at least w/ my vmx!)...
Try again, but this time without any return chars in the rpc:
<?xml version='1.0' encoding='utf-8'?><rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="101"><get-config><source><running/></source></get-config></rpc>]]>]]>
Until you hit return nothing should happen... as it does on all the other boxes ive tested with! Hit return and bam, of course things work as expected!
So.... to confirm this I wonder if you could test out a very simple little tweak to the base_channel... to find that file to edit you can do:
python -c "from scrapli_netconf.channel import base_channel; print(base_channel.__file__)"
/SOMEPATH/scrapli_netconf/channel/base_channel.py
Line 47 should look like this right now:
return channel_input
Can you try replacing that line with:
return channel_input.replace("\n", "")
Then we can see if that sorts things out.... why this did not seem to cause any issues with asyncssh at this point I have zero idea... perhaps it is faster, or the backend channel reading/writing is different (its been too long for me to recall off the top of my head hah).
Let us know if that gets things looking happier!
Carl
Wow, you've cracked it..!!! Totally makes sense in terms of the logic. I tried the SSH/NETCONF manual method and sure enough it will sit there until you hit the carriage return (as you explained it.) I then made the small tweak to _basechannel.py which has stopped the traceback. However, instead of the get_config result, it came back with Response
🔥 🔥 🔥 🔥 🔥 🔥 🔥 🔥 Awwwwwwww yeah!!!! Phew... that is great!! Thought I was going crazy there for a minute :)
I will need to test this out a bit more before getting it pushed up to develop and further... I'll leave this open until then so I can let ya know easily when things are happening. Hopefully that little patch up will hold ya over till then.
Oh, PS - the reason the async bits work is down to how the underlying asyncssh connection is handled where there is basically a stdin and stdout channel, so the output from the device getting sent "early" was never mixing with our command input and thus not causing us any issues!
Thanks a ton for helping work through this one Gary, this was a fun one :D
Carl
Just to confirm the fully working script for non-asyncio:
from scrapli_netconf.driver import NetconfScrape
import logging
logging.basicConfig(filename="scrapli.log", level=logging.INFO)
logger = logging.getLogger("scrapli")
my_device = {
"host": "172.18.0.2",
"auth_username": "mike",
"auth_password": "root123",
"auth_strict_key": False,
"port": 22,
"transport": "system"
}
conn = NetconfScrape(**my_device)
conn.open()
cap=conn.server_capabilities
print(cap)
result = conn.get_config().result
print(result)
I am so happy you nailed it Carl, I agree it was great fun getting to the bottom of it. I would like to thank you for being so helpful and supportive. Normally, I am fixing networks issues and this is not nearly as much joy as doing this.
Perfect! Yep -- normally you'll probably want to get the result like:
result = conn.get_config()
print(result.result)
As there are other "goodies" in that result object that you may want to investigate -- stuff like start/end/elapsed time, channel input, xml output (as an lxml Element) -- there is also a handy failed
attribute (result.failed
) and a raise_for_status
method that can raise an exception if the rpc failed!
No problemo Gary, was a good adventure :)!
@gwoodwa1 if you get a chance, would you mind testing the current develop branch to see if it fixes this issue. You can install like so: pip install -e git+https://github.com/scrapli/scrapli_netconf.git@develop#egg=scrapli_netconf
hopefully we should be good to go with that little tune up -- testing against IOSXE/IOXSR/Junos seems to be all good with this!
Hi Carl, Just tried it quickly and all working with the new branch on JunOS.
Boom! Thanks Gary! I'm going to go ahead and close this one out now... I'm hoping to check a few more things before merging to master and making another release, so that may be another week or two, but hopefully develop will hold ya over for now!
Thanks again for the help with this!
Carl
Hi,
I just tried to use scrapli netconf to connect to a JunOS router in my lab. I can exchange server capabilities but I get an XML error when doing any operations such as get_config()
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1