Closed gessel closed 3 years ago
I think this is the code that is generating the error (it's the only code with the error message "failed to send packet"):
/*
* Send the packet.
*/
if (verbose > 1)
warn_msg("---\tSending packet #%u to host %s tmo %d", he->num_sent,
my_ntoa(he->addr), he->timeout);
if (write_pkt_to_file) {
nsent = write(write_pkt_to_file, buf, buflen);
} else if (!pkt_read_file_flag) {
for (i=0; i<retry_send; i++) {
nsent = pcap_sendpacket(pcap_handle, buf, buflen);
if (nsent >= 0) {
break;
}
if (retry_send_interval > 0) {
sleep(retry_send_interval);
}
}
}
if (nsent < 0)
err_sys("ERROR: failed to send packet");
What appears to be happening is that pcap_sendpacket() is returning -1 indicating an error. The "Resource temporarily unavailable" part of the message comes from the system error specified by errno and corresponds to the error code EAGAIN which means that the operation would have blocked but nonblocking operation was requested.
The EAGAIN will be generated by the underlying system call that pcap_sendpacket() calls - probably sendto().
I've never seen this error, but I've never tried using arp-scan on a 10G link. You may be able to get more detail by using strace to see the system calls and return codes. E.g.
# strace -e socket,open,close,send,sendto ./arp-scan --localnet
You could also try running arp-scan under gdb which would allow you to examine variables etc after the error has occurred.
As I've not seen this error before and I don't have access to the hardware that appears to cause it, I'd be interested to determine the cause and hopefully a fix or workaround.
sudo ./arp-scan -B 10M 10.0.0.0/8
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 16777216 hosts (https://github.com/royhills/arp-scan)
10.34.99.100 2c:fa:a2:71:6c:09 Alcatel-Lucent Enterprise
10.36.99.1 00:20:85:ea:56:fa Eaton Corporation
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
10.0.0.160 00:03:c5:16:35:b1 Mobotix AG
ERROR: failed to send packet: Resource temporarily unavailable
however ...
sudo strace -e socket,open,close,send,sendto -o stracedump.txt ./arp-scan -B 10M 10.0.0.0/8
never seems to error out - and the strace output file gets gigantic really fast - like 98M in a minute or two.
running under gdb successfully failed:
(gdb) run -B 10M 10.0.0.0/8
Starting program: /home/zabbix/projects/arp-scan/arp-scan -B 10M 10.0.0.0/8
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 16777216 hosts (https://github.com/royhills/arp-scan)
10.36.87.1 ac:cc:8e:88:06:55 Axis Communications AB
10.34.87.5 ac:cc:8e:88:06:65 Axis Communications AB
10.31.87.1 ac:cc:8e:88:06:d0 Axis Communications AB
10.33.87.5 ac:cc:8e:88:06:4d Axis Communications AB
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
ERROR: failed to send packet: Resource temporarily unavailable
[Inferior 1 (process 1405) exited with code 01]
(gdb) l
81 static int llc_flag=0; /* Use 802.2 LLC with SNAP */
82 static int ieee_8021q_vlan=-1; /* Use 802.1Q VLAN tagging if >= 0 */
83 static int pkt_write_file_flag=0; /* Write packet to file flag */
84 static int pkt_read_file_flag=0; /* Read packet from file flag */
85 static char pkt_filename[MAXLINE]; /* Read/Write packet to file filename */
86 static int write_pkt_to_file=0; /* Write packet to file for debugging */
87 static int rtt_flag=0; /* Display round-trip time */
88 static pcap_dumper_t *pcap_dump_handle = NULL; /* pcap savefile handle */
89 static int plain_flag=0; /* Only show host information */
90 unsigned int random_seed=0;
Anything I can do?
That for doing this, it explains a lot.
I think what's happening is that arp-scan is sometimes trying to send packets quicker than the network driver can process them causing the EAGAIN error.
I beleive the reason you don't see the error when running under strace is because strace is slowing arp-scan down enough that it never sends packets quickly enough to cause the error, whereas gdb doesn't slow it down by the same amount.
I suspect that reducing the requesting bandwidth will reduce the occurance of the failures because higher bandwidths cause shorter delays between packets. I think the cure is to attempt to resend the packet if pcap_sendpacket() fails with EAGAIN, and I'll look into making this change later.
Of course, happy to, thanks for making a very useful tool. Failures correlated with data rates - at 10M they seem pretty predictable, at lower rates far less frequent.
So far, and I'm extrapolating from very limited data, the sample above is typical: it runs for between a few (3 or 4) seconds and up to about 15-20 seconds before throwing the error, but if it runs for more than a minute or so, it seems to complete successfully. I have not tested that many complete runs yet as there are only so many hours..
I'm happy to do more testing.
I've made some changes in git commits 85cc112af92c4f6616e38378db3dd49087afe6ea and 339a60aea5106a7a8bb4c2143781dbf11c5a9e02 to try to address this problem. Could you see if it makes any difference?
The change makes arp-scan try to send the packet up to retry_send times with retry_send_interval ms delay if the packet send fails. By default retry_send is 3 times and retry_send_interval is 10ms but these can be changed with the --retry-send and --retry-send-interval options.
I think I built correctly (no errors, install seemed successful):
# ls -la \usr\local\bin
-rwxr-xr-x 1 root root 206936 Sep 9 12:20 arp-scan
But results are the same:
$ sudo /usr/local/bin/arp-scan -B 10M 10.0.0.0/8
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 16777216 hosts (https://github.com/royhills/arp-scan)
10.36.99.1 00:20:85:ea:56:fa Eaton Corporation
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
ERROR: failed to send packet: Resource temporarily unavailable
(gdb) run -B 1M 10.0.0.0/16
Starting program: /usr/local/bin/arp-scan -B 1M 10.0.0.0/16
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 65536 hosts (https://github.com/royhills/arp-scan)
10.0.0.160 00:03:c5:16:35:b1 Mobotix AG
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
10.0.0.174 00:50:c2:51:c3:31 TOA Systems
117 packets received by filter, 0 packets dropped by kernel
Ending arp-scan 1.9.7: 65536 hosts scanned in 68.484 seconds (956.95 hosts/sec). 3 responded
[Inferior 1 (process 23003) exited normally]
(gdb) bt full
No stack.
on smaller (/16 instead of /8) scans triggering a fail at 10M isn't reliable but at 100M it happens quite quickly:
(gdb) run -B 100M 10.0.0.0/16
Starting program: /usr/local/bin/arp-scan -B 100M 10.0.0.0/16
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 65536 hosts (https://github.com/royhills/arp-scan)
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
10.0.0.160 00:03:c5:16:35:b1 Mobotix AG
ERROR: failed to send packet: Resource temporarily unavailable
[Inferior 1 (process 30357) exited with code 01]
(gdb) bt full
No stack.
Happy to compile with the -g flag.
This may be just noise, but I compiled with CFLAGS='-g -Wall -Wextra' make
and tried:
(gdb) run -B 100M 10.0.0.0/8
Starting program: /usr/local/bin/arp-scan -B 100M 10.0.0.0/8
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 16777216 hosts (https://github.com/royhills/arp-scan)
10.36.73.4 00:03:c5:16:28:a5 Mobotix AG
10.34.73.18 00:03:c5:16:3b:8e Mobotix AG
10.34.87.4 ac:cc:8e:88:06:44 Axis Communications AB
10.33.99.1 00:20:85:ea:57:21 Eaton Corporation
10.0.0.170 54:80:28:58:fe:d8 Hewlett Packard Enterprise
ERROR: failed to send packet: Resource temporarily unavailable
[Inferior 1 (process 17140) exited with code 01]
(gdb) bt full
No stack.
(gdb) b main
Breakpoint 1 at 0x555555556300: file arp-scan.c, line 95.
(gdb) run -B 100M 10.0.0.0/8
Starting program: /usr/local/bin/arp-scan -B 100M 10.0.0.0/8
Breakpoint 1, main (argc=4, argv=0x7fffffffe538) at arp-scan.c:95
95 main(int argc, char *argv[]) {
(gdb) bt full
#0 main (argc=4, argv=0x7fffffffe538) at arp-scan.c:95
now = {tv_sec = 0, tv_usec = 140737354127856}
diff = {tv_sec = 140737345380584, tv_usec = 12288}
select_timeout = <optimized out>
loop_timediff = <optimized out>
host_timediff = <optimized out>
last_packet_time = {tv_sec = 140737351858688, tv_usec = 1152}
req_interval = <optimized out>
cum_err = <optimized out>
start_time = {tv_sec = 39, tv_usec = 0}
end_time = {tv_sec = 0, tv_usec = 0}
elapsed_time = {tv_sec = 140737488348176, tv_usec = 447542124626712}
elapsed_seconds = <optimized out>
reset_cum_err = <optimized out>
pass_no = <optimized out>
first_timeout = <optimized out>
i = <optimized out>
errbuf = "\230\352\377\367\377\177\000\000X\344\377\377\377\177\000\000\220\344\377\377\377\177\000\000\020\347\377\367\377\177\000\000\000\000\000\000\000\000\000\000\237\343\335\367\377\177\000\000\000\000\000\000\000\000\000\000\220\344\377\377\377\177", '\000' <repeats 26 times>, "\020\347\377\367\377\177\000\000\aG\225\367\377\177\000\000\300\t\000\000\000\000\000\000\260\343\377\377\377\177\000\000\300\343\377\377\377\177\000\000\230\352\377\367\377\177", '\000' <repeats 14 times>, "@\000\000\000\000\002\000\000\000\004\000\000\377\377\377\377", '\000' <repeats 12 times>, "h\242\377\367\377\177\000\000\020\347\377\367\377\177", '\000' <repeats 34 times>...
filter = {bf_len = 0, bf_insns = 0x0}
filter_string = <optimized out>
netmask = 4158470420
localnet = 32767
datalink = <optimized out>
ret_status = <optimized out>
pcap_fd = <optimized out>
interface_mac = "\377\367\377\177\000"
pcap_handle = <optimized out>
interface_ip_addr = {s_addr = 4158469056}
I've made a small change in commit 3b7f32e6cbe443d5f03ebb854f9aec0d99abcc66 which may help: changed the retry send delay from milliseconds to microseconds and added a message when the packet is resent if verbose is specified:
to.tv_sec = retry_send_interval/1000000;
to.tv_usec = (retry_send_interval - 1000000*to.tv_sec);
for (i=0; i<retry_send; i++) {
nsent = pcap_sendpacket(pcap_handle, buf, buflen);
if (nsent >= 0) {
break;
}
if (retry_send_interval > 0) {
if (verbose)
warn_msg("---\tRetrying send after %d microsecond delay",
retry_send_interval);
select(0, NULL, NULL, NULL, &to); /* Delay */
}
}
I have not been able to reproduce the problem, but I think that's because I'm running on a VMware virtual machine which uses virtual Ethernet devices rather than a physical Ethernet adapter. I suspect the virtual adapters are just performing memory copies between the guest and host memory and never experience a situation when the adapter send buffer is full.
I'd be interested to see if the retry is being attempted. If you run arp-scan with e.g. arp-scan -v -B 100M 10.0.0.0/16
you will hopefully see messages like --- Retrying send after 10000 microsecond delay
.
If this logic works then you should be able to adjust the number of retries with --retry-send and the delay between retries with --retry-send-interval.
If you want to investigate the problem with gdb, I think the best way would be to set a breakpoint at the line which calls the function to print the error and exit. This is in function send_packet() and reads err_sys("ERROR: failed to send packet");
In the latest version this is at line 917 but that line number can change as the code is changed, so best to list the function and then set a breakpoint at the appropriate line number, e.g:
(gdb) list send_packet
829 * If we are using the undocumented --readpktfromfile option, then we
830 * don't send anything.
831 */
832 int
833 send_packet(pcap_t *pcap_handle, host_entry *he,
834 struct timeval *last_packet_time) {
835 unsigned char buf[MAX_FRAME];
836 size_t buflen;
837 ether_hdr frame_hdr;
838 arp_ether_ipv4 arpei;
(gdb)
839 int nsent = 0;
840 unsigned i;
841 struct timeval to;
842 /*
843 * Construct Ethernet frame header
844 */
845 memcpy(frame_hdr.dest_addr, target_mac, ETH_ALEN);
846 memcpy(frame_hdr.src_addr, source_mac, ETH_ALEN);
847 frame_hdr.frame_type = htons(eth_pro);
848 /*
(gdb)
849 * Construct the ARP Header.
850 */
851 memset(&arpei, '\0', sizeof(arp_ether_ipv4));
852 arpei.ar_hrd = htons(arp_hrd);
853 arpei.ar_pro = htons(arp_pro);
854 arpei.ar_hln = arp_hln;
855 arpei.ar_pln = arp_pln;
856 arpei.ar_op = htons(arp_op);
857 memcpy(arpei.ar_sha, arp_sha, ETH_ALEN);
858 memcpy(arpei.ar_tha, arp_tha, ETH_ALEN);
(gdb)
859 if (arp_spa_is_tpa) {
860 if (he) {
861 arpei.ar_sip = he->addr.s_addr;
862 }
863 } else {
864 arpei.ar_sip = arp_spa;
865 }
866 if (he)
867 arpei.ar_tip = he->addr.s_addr;
868 /*
(gdb)
869 * Copy the required data into the output buffer "buf" and set "buflen"
870 * to the number of bytes in this buffer.
871 */
872 marshal_arp_pkt(buf, &frame_hdr, &arpei, &buflen, padding, padding_len);
873 /*
874 * If he is NULL, just return with the packet length.
875 */
876 if (he == NULL)
877 return buflen;
878 /*
(gdb)
879 * Check that the host is live. Complain if not.
880 */
881 if (!he->live) {
882 warn_msg("***\tsend_packet called on non-live host: SHOULDN'T HAPPEN");
883 return 0;
884 }
885 /*
886 * Update the last send times for this host.
887 */
888 Gettimeofday(last_packet_time);
(gdb)
889 he->last_send_time.tv_sec = last_packet_time->tv_sec;
890 he->last_send_time.tv_usec = last_packet_time->tv_usec;
891 he->num_sent++;
892 /*
893 * Send the packet.
894 */
895 if (verbose > 1)
896 warn_msg("---\tSending packet #%u to host %s tmo %d", he->num_sent,
897 my_ntoa(he->addr), he->timeout);
898 if (write_pkt_to_file) {
(gdb)
899 nsent = write(write_pkt_to_file, buf, buflen);
900 } else if (!pkt_read_file_flag) {
901 to.tv_sec = retry_send_interval/1000000;
902 to.tv_usec = (retry_send_interval - 1000000*to.tv_sec);
903 for (i=0; i<retry_send; i++) {
904 nsent = pcap_sendpacket(pcap_handle, buf, buflen);
905 if (nsent >= 0) {
906 break;
907 }
908 if (retry_send_interval > 0) {
(gdb)
909 if (verbose)
910 warn_msg("---\tRetrying send after %d microsecond delay",
911 retry_send_interval);
912 select(0, NULL, NULL, NULL, &to); /* Delay */
913 }
914 }
915 }
916 if (nsent < 0)
917 err_sys("ERROR: failed to send packet");
918
(gdb) b 917
Breakpoint 1 at 0x72c0: file arp-scan.c, line 917.
setting the breakpoint at err_sys
(gdb) b 914
Breakpoint 3 at 0x55555555b030: file arp-scan.c, line 914.
(gdb) run -B 100M 10.0.0.0/8
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/bin/arp-scan -B 100M 10.0.0.0/8
Interface: ens160, type: EN10MB, MAC: 00:0c:29:46:89:cd, IPv4: 10.101.50.253
Starting arp-scan 1.9.7 with 16777216 hosts (https://github.com/royhills/arp-scan)
10.101.20.42 48:df:37:41:4c:68 Hewlett Packard Enterprise
10.101.50.52 2c:fa:a2:90:8f:b4 Alcatel-Lucent Enterprise
10.201.10.52 d0:67:26:d9:69:7a Hewlett Packard Enterprise
10.31.99.1 00:20:85:e6:f2:ec Eaton Corporation
10.35.87.2 ac:cc:8e:88:06:6a Axis Communications AB
Breakpoint 3, send_packet (pcap_handle=0x555555769a10, he=<optimized out>, last_packet_time=<optimized out>) at arp-scan.c:914
914 err_sys("ERROR: failed to send packet");
(gdb) bt full
#0 send_packet (pcap_handle=0x555555769a10, he=<optimized out>, last_packet_time=<optimized out>) at arp-scan.c:914
buf = "\377\377\377\377\377\377\000\f)F\211\315\b\006\000\001\b\000\006\004\000\001\000\f)F\211\315\ne2\375\000\000\000\000\000\000\n\000\001f\000\000\000\000\000\000h\r\000\000\000\000\000\000\377\317UUUU\000\000\372\317UUUU\000\000\002", '\000' <repeats 15 times>, "|\000\000\000w\000\000\000n\000\000\000[\000\000\000\000\f\254F\003\345\307\300 \000\000\000\000\000\000\000\340\213\230UUU\000\000@̸\367\377\177\000\000P\214\230UUU\000\000\000\000\000\000\000\000\000\000\060\000\000\000\000\000\000\000 \000\000\000\000\000\000\000\245i\203\367\377\177\000\000 \000\000\000\000\000\000\000\b\000\000\000\060\000\000\000\360\213\230UUU\000\000 "...
buflen = 42
frame_hdr = {dest_addr = "\377\377\377\377\377\377", src_addr = "\000\f)F\211", <incomplete sequence \315>, frame_type = 1544}
arpei = {ar_hrd = 256, ar_pro = 8, ar_hln = 6 '\006', ar_pln = 4 '\004', ar_op = 256, ar_sha = "\000\f)F\211", <incomplete sequence \315>, ar_sip = 4247938314,
ar_tha = "\000\000\000\000\000", ar_tip = 1711341578}
nsent = <optimized out>
i = <optimized out>
to = {tv_sec = 0, tv_usec = 0}
#1 0x0000555555556d82 in main (argc=<optimized out>, argv=<optimized out>) at arp-scan.c:589
now = {tv_sec = 1631185573, tv_usec = 404446}
diff = {tv_sec = 1631185573, tv_usec = 404446}
select_timeout = 0
loop_timediff = 2
host_timediff = <optimized out>
last_packet_time = {tv_sec = 1631185573, tv_usec = 404446}
req_interval = 0
cum_err = 178155
start_time = {tv_sec = 1631185568, tv_usec = 672510}
end_time = {tv_sec = 0, tv_usec = 0}
elapsed_time = {tv_sec = 140737488348176, tv_usec = 454402676365008}
elapsed_seconds = <optimized out>
reset_cum_err = 0
pass_no = 0
first_timeout = 1
i = <optimized out>
errbuf = "\230\352\377\367\377\177\000\000X\344\377\377\377\177\000\000\220\344\377\377\377\177\000\000\020\347\377\367\377\177\000\000\000\000\000\000\000\000\000\000\237\343\335\367\377\177\000\000\000\000\000\000\000\000\000\000\220\344\377\377\377\177", '\000' <repeats 26 times>, "\020\347\377\367\377\177\000\000\aG\225\367\377\177\000\000\300\t\000\000\000\000\000\000\260\343\377\377\377\177\000\000\300\343\377\377\377\177\000\000\230\352\377\367\377\177", '\000' <repeats 14 times>, "@\000\000\000\000\002\000\000\000\004\000\000\377\377\377\377", '\000' <repeats 12 times>, "h\242\377\367\377\177\000\000\020\347\377\367\377\177", '\000' <repeats 34 times>...
filter = {bf_len = 20, bf_insns = 0x55555576f440}
filter_string = <optimized out>
netmask = 255
localnet = 10
datalink = <optimized out>
ret_status = <optimized out>
pcap_fd = 3
interface_mac = "\000\f)F\211", <incomplete sequence \315>
pcap_handle = <optimized out>
interface_ip_addr = {s_addr = 4247938314}
(gdb)
sorry, didn't see the update, I'll recompile with the new code and try again.
Looks good! No failure at 100M... (sorry, tmux so no scrollback, but )
555 packets received by filter, 0 packets dropped by kernel
Ending arp-scan 1.9.7: 16777216 hosts scanned in 297.546 seconds (56385.28 hosts/sec). 466 responded
[Inferior 1 (process 18504) exited normally]
I'll recompile without debug symbols and do a few more runs to verify, but I think you got it!
If you're interested, I tried both massscan and zmap for the same process - enumerating all hosts on a sometimes, shall we say "inexpertly" managed LAN, and both missed a lot of hosts, more so at higher data rates, and were much slower at than arp-scan at useful host-miss rates. So thanks for this tool.
No fails scanning 17956864 hosts. Definitely lots of retries, also definitely bandwidth related, piping 2> to errors, and counting with cat errors_100M.txt | grep Retrying | wc -l
at 100M: 17275 errors at 10M: 225 errors
command:
sudo /usr/local/bin/arp-scan -v -g -r 2 -B 10M 10.0.0.0/8 192.168.0.0/16 169.254.0.0/16 172.16.0.0/12 > $ArpScan 2> errors_10M.txt
Looks like this resolved the issue.
That's good news.
So we're seeing approx 0.1% occurances of "Ethernet adapter send buffer full" at 100M which doesn't look too bad. Certainly better to retry these "errors" (which are really temporary resource shortages) a few times rather than just abort with an error.
It still needs a bit of code tidying etc but good to see that we've addressed the underlying problem.
Thank you very much for the fast, expert response.
Running on a linux VM and executing
sudo /usr/sbin/arp-scan -g -r 2 -B 1M 10.0.0.0/8 192.168.0.0/16 169.254.0.0/16 172.16.0.0/12
arp-scan sometimes fails withERROR: failed to send packet: Resource temporarily unavailable
This typically happens early in the scan, but not immediately, say after 5-15 hosts are enumerated, but within minutes rather than the 4-5 hours the full scan takes at-B 1M
.It seems to be more reliable at lower scan rates, as one would expect from the error, but not reliably so - for example scans completed at
-B 10M
(after much less time, of course).Is there some way to make the process robust against this error - there's no indication that fall back does anything and running
-vvv
just generates an unusable amount of data (plus I suspect the hobble of writing to stdio sufficiently slows the process to reduce the risk of barfing).ETH0 is a 10G optical link, so it isn't getting close to saturating the link even at 10M. Also, the system seems quite responsive even at 10M and there's no indication in top that resources are being problematically consumed even at 10M, let alone at 1M.
I tried building master rather than using the distro version - same version number (1.9) so maybe that will help, but I don't see anything referencing this error message.