kevinlawler / kona

Open-source implementation of the K programming language
ISC License
1.36k stars 139 forks source link

partial fetch from a network connection #523

Closed bakul closed 5 years ago

bakul commented 6 years ago

Under kona

  x:`"google.com"`http 4:"GET /"
  #x
1418

Under k3

  x:`"google.com"`http 4:"GET /"
  #x
51490

The first 1418 seem similar (except for some per connection unique data) but after that kona gives up too soon.

tavmem commented 5 years ago

I tried the above commands multiple times using kona.

Mostly, I got the same result as you, 1418 characters. A few times, I got 2836 characters (2 times 1418). Once, I got 14180 characters (10 times 1418).

In the file src/0.c there is a function K _4d_(S srvr,S port,K y) with the line C buf[20000]; n=read(sockfd,&buf,20000); r=close(sockfd); if(r)R FE;

The first problem is that the buffer size is 20000.
It can't possibly yield the 51490 result that you got in k3.

The second problem is that that read is yielding inconsistent multiples of 1418. I have not figured out why, yet.

bakul commented 5 years ago

I suspect kona may be opening the socket in nonblocking mode, which would return with EAGAIN err in case there was no data ready to be read.

tavmem commented 5 years ago

Yes, nonblocking mode might be the problem.

Another possibility from the linux man page:

       read() attempts to read up to count bytes from file descriptor fd
       into the buffer starting at buf.

       On files that support seeking, the read operation commences at the
       file offset, and the file offset is incremented by the number of
       bytes read.  If the file offset is at or past the end of file, no
       bytes are read, and read() returns zero.

The key words are "attempts" and "offset". Does the target file support "seeking"? Maybe multiple reads are necessary.

bakul commented 5 years ago

There should no different for normal file read. This is related to sockets only. Seeking is not relevant. Assuming the socket is non-blocking, you'd have to continue reading until a read returns 0 bytes. Errors related to non-blocking (EAGAIN) should be handled. If you do the equivalent of a blocking read, you can not interrupt such a read (which can take a very long time, depending on the web page, network speed etc. so it should be possible to interrupt it).

Buffer length doesn't matter as in theory no amount of buffering may be enough. You just have to keep reading it and creating lines.

tavmem commented 5 years ago

Just documenting what I found. I will continue to research this issue ... EAGAIN does not occur.

I modified the function _4d_ to print errno 5 times:

K _4d_(S srvr,S port,K y){
  struct addrinfo hints, *servinfo, *p; int rv,sockfd; S errstr; I r;
  memset(&hints,0,sizeof hints); hints.ai_family=AF_UNSPEC; hints.ai_socktype=SOCK_STREAM;
O("errno0: %d\n",errno);
  if((rv=getaddrinfo(srvr,port,&hints,&servinfo))){fprintf(stderr,"conn: %s\n",gai_strerror(rv)); R DOE;}
O("errno1: %d\n",errno);
  for(p=servinfo; p!=NULL; p=p->ai_next)
    if((sockfd=socket(p->ai_family,p->ai_socktype,p->ai_protocol))==-1)continue;
    else if(connect(sockfd,p->ai_addr,p->ai_addrlen)==-1){errstr=strerror(errno); r=close(sockfd); if(r)R FE; continue;}
    else break;
  if(p==NULL){fprintf(stderr, "conn: failed to connect (%s)\n",errstr); freeaddrinfo(servinfo); R DOE;}
  I n=strlen(kC(y)); C msg[n+5]; I i=0; for(i=0;i<n+1;i++){msg[i]=kC(y)[i];}
  msg[n]='\r'; msg[n+1]='\n'; msg[n+2]='\r'; msg[n+3]='\n'; msg[n+4]='\0';
  if(write(sockfd, &msg, strlen(msg))==-1){r=close(sockfd); if(r)R FE; R WE;}
  C buf[20000];
O("errno2: %d\n",errno); errno=0; O("errno3: %d\n",errno);
  n=read(sockfd,&buf,20000);
O("errno4: %d\n",errno);
  r=close(sockfd); if(r)R FE;
  K z=newK(n==1?3:-3,n); memcpy(kC(z),&buf,n);
  freeaddrinfo(servinfo);
  if(n==0)R _n();
  else R z; }

just before getaddrinfo, errno is 0.

getaddrinfo sets errno to 101 (ENETUNREACH), although getaddrinfo succeeds (returns 0).

just before read, errno is still 101.
I reset errno to 0 just in case read also throws a 101 (and reprint errno to verify the reset).

just after read, errno is still 0. read reports no error at all.

     # `"google.com"`http 4:"GET /"
errno0: 0
errno1: 101
errno2: 101
errno3: 0
errno4: 0
8508
  \\

In this case, 8508 characters were received (6 times 1418).

bakul commented 5 years ago

The code shows that the socket is not opened in non-blocking mode. If you replace the read logic with something like the following it should fix this problem.

C buf[20000]; n = 0;
  do {
        I n1=read(sockfd,&buf[n],sizeof buf-n);
        if(n1==0)break;
        if(n1<0){O("errno: %d\n",errno); break;}
        n += n1;
  } while(n<sizeof buf);
  r=close(sockfd); if(r)R FE;

But this won't be enough as a website may send an arbitrary amount of data. Even for the test case 20,000 bytes is too small. You will need to allocate space as necessary.